Discussion:
Clones (was RE: Hexadecimal)
J***@aculab.com
2003-08-18 10:17:37 UTC
Permalink
All of this makes sense to me, apart from one or two tiny niggling points...

I confess, I hadn't read ch14.pdf, and I probably should have done. My
fault. But I still believe that there should be something in the
machine-readable code charts themselves that says, of the Roman numerals,
"Don't use these characters - use the the normal Latin letters instead". If
they really are there _SOLELY_ for round trip compliance with East Asian
standards, then, if I wish to put the year MMIII in a web page, I should
_NOT_ use the Roman letters. Furthermore, if I write software to interpret
Roman Numbers, I only need to interpret the Basic Latin letters, not the
Roman ones. My life as a webmaster and programmer is made so much SIMPLER by
not having to use the Roman letters. I would really like it if these, and
every single other character which is "only there for reasons of round trip
compatibility" with something else, were explicity marked in the
machine-readable charts with something meaning "Don't introduce this
character, at all, ever. Don't try to interpret it. Just preserve it, in
case it ever gets turned back to its original character set".

Secondly, I believe that the code charts SHOULD provide machine-readable
information about the hexadecimal values of the letters "A" to "F".
Codepoint FF21, for example, has the property "Hex_Digit". Now, I _could_
parse the textual description in the rest of the line ("FULLWIDTH LATIN
CAPITAL LETTER A"), deduce that this can be replaced by "A", and then use
the ASCII algorithm to convert this to ten ... but it would be SO MUCH NICER
if _every_ character (or range of characters) which had the "Hex_Digit"
property ALSO had a simple, straightforward, lookup table, which immediately
told me that, when interpretted as hex, this symbol means ten.

Thirdly, as Jim pointed out, specialist disciplines should not expect
characters to be cloned all over the place just because they have a
different meaning in their particular discipline. I do agree with this, but
what confuses me is what APPEAR to be the large number of violations of this
rule already present in Unicode. For example:
U+2212 (minus sign) - an obvious clone of U+002D (hyphen-minus). Who
uses this?
U+2217 (asterisk operator) - an equally obvious clone of U+002A
(asterisk)
U+223C (tilde operator) - a clone of U+007E (tilde)
and then there's:
U+2223 (divides) - hell, that looks to me remarkably like U+007C
(vertical line)

Conversely, there are also things that look different, but mean the same.
For example:
U+2264 (less than or equal to) - compare with U+2A7D (less than or
slanted equal to)

The last example is interesting (to me) because the difference between the
two seems like a font difference - like the difference between "g" with a
tail and "g" with a loop. In defence of this argument, I point out that the
complementary relation, NOT equal to, has codepoint U+2270, and this is
represented in the code charts as having a slanted equal to, so it OUGHT to
be the complement of U+2A7D. (Unless I've missed it, there appears to be no
"not equal to with horizontal equals" character).

So, yes, I agree with Jim. Let's not have too many duplicates. But I still
have to ask why there are so many already?




-----Original Message (1)-----
From: Doug Ewell [mailto:***@adelphia.net]
Sent: Saturday, August 16, 2003 9:14 PM
To: Unicode mailing list
Cc: Pim Blokland
Subject: Re: Hexadecimal

Not exactly. The character U+216E ROMAN NUMERAL FIVE HUNDRED came from
an East Asian double-byte character set, and was carried over into
Unicode for round-tripping reasons. It is a compatibility equivalent of
U+0044.



AND...
-----Original Message (2)-----
From: Jim Allan [mailto:***@smrtytrek.com]
Sent: Saturday, August 16, 2003 9:13 PM
To: ***@unicode.org
Subject: Re: Hexadecimal

.... from an explanation as to why Unicode
coded Roman numerals separately. See 14.3 at
http://www.unicode.org/versions/Unicode4.0.0/ch14.pdf:

<< Number form characters are encoded solely for compatibility with
existing standards. >>

Also

<< Roman Numerals. The Roman numerals can be composed of sequences of
the appropriate Latin letters. Upper- and lowercase variants of the
Roman numerals through 12, plus L, C, D, and M, have been encoded for
compatibility with East Asian standards. >>



AND FINALLY...
-----Original Message (3)-----
From: Jim Allan [mailto:***@smrtytrek.com]
Sent: Saturday, August 16, 2003 9:13 PM
To: ***@unicode.org
Subject: Re: Hexadecimal

Anyone at any time in any descipline can assign a special meaning to a
Latin letter without waiting for this meaning to be encoded in Unicode
and should not expect that a clone of the character with that special
meaning would ever be encoded in Unicode.



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
John Cowan
2003-08-18 15:38:59 UTC
Permalink
Post by J***@aculab.com
"Don't use these characters - use the the normal Latin letters instead".
That's essentially the implication of being a compatibility character.
Post by J***@aculab.com
Secondly, I believe that the code charts SHOULD provide machine-readable
information about the hexadecimal values of the letters "A" to "F".
0030;0
0031;1
0032;2
0033;3
0034;4
0035;5
0036;6
0037;7
0038;8
0039;9
0041;10
0042;11
0043;12
0044;13
0045;14
0046;15
0061;10
0062;11
0063;12
0064;13
0065;14
0066;15
FF10;0
FF11;1
FF12;2
FF13;3
FF14;4
FF15;5
FF16;6
FF17;7
FF18;8
FF19;9
FF21;10
FF22;11
FF23;12
FF24;13
FF25;14
FF26;15
FF41;10
FF42;11
FF43;12
FF44;13
FF45;14
FF46;15

Thuryago.
Post by J***@aculab.com
U+2212 (minus sign) - an obvious clone of U+002D (hyphen-minus). Who
uses this?
The ASCII characters, because they have had to do double or triple
duty over the years when we had a very limited 7-bit character set,
often have several near-equivalents in Unicode that disambiguate their
*typographically* different purposes. Thus hyphen, minus sign, en dash,
and em dash have separate Unicode representations, though in ASCII they
are often written -, -, -- or -, and --- or -- respectively.
Post by J***@aculab.com
Conversely, there are also things that look different, but mean the same.
U+2264 (less than or equal to) - compare with U+2A7D (less than or
slanted equal to)
It turns out that in some math contexts one or the other is strongly enough
preferred that it's worth having two characters so as to avoid getting the
"wrong" glyph.
Post by J***@aculab.com
So, yes, I agree with Jim. Let's not have too many duplicates. But I still
have to ask why there are so many already?
"History there is, and no history."
--The High Inquest

"Every character has its story."
--various Unicode tribal elders
--
John Cowan <***@reutershealth.com>
http://www.reutershealth.com http://www.ccil.org/~cowan
.e'osai ko sarji la lojban.
Please support Lojban! http://www.lojban.org


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Jim Allan
2003-08-18 16:06:37 UTC
Permalink
Post by J***@aculab.com
I would really like it if these, and
every single other character which is "only there for reasons of round trip
compatibility" with something else, were explicity marked in the
machine-readable charts with something meaning "Don't introduce this
character, at all, ever. Don't try to interpret it. Just preserve it, in
case it ever gets turned back to its original character set".
That would probably be too strong.

If characters are available then some people will use them. :-(

See section 2.3 at http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf

Unicode 3.0 contained under section D21 on compatibility characters:

<< Their use is discouraged other than for legacy data. >>

I don't know whether this statement was intentionally removed was
accidently dropped in the changes in 4.0 which distinguish
"compatitiblity character" from "compatibility composite character".

In any case people can't be prevent from doing things that are
officially discouraged, especially as for some particular use it might
be wrong to discourage them. So if you are handling Roman numerals in an
application and wish your handling to be complete then unfortunately you
do have to take the compatibility Roman numerals into account.
Post by J***@aculab.com
U+2212 (minus sign) - an obvious clone of U+002D (hyphen-minus). Who
uses this?
People concerned with proper appearance of the symbol in proportional
fonts. Almost all proportional fonts use a narrow hyphen dash rather
than a minus-width dash for the hyphen-minus character. In some
older-style fonts it is even a slanting character.

See http://www.unicode.org/versions/Unicode4.0.0/ch06.pdf in 6.2 for a
detailed discussion of the various dash characters.
Post by J***@aculab.com
U+2217 (asterisk operator) - an equally obvious clone of U+002A
(asterisk)
They look much the same in a typewriter style font. They don't do so in
proportional fonts where the regular asterisk tends to appear somewhat
like a superscript.

Unicode provides support both for good typographical usage as well as
traditional data-processing typographical usage based based on
typewriter technology.
Post by J***@aculab.com
U+223C (tilde operator) - a clone of U+007E (tilde)
See http://www.unicode.org/versions/Unicode4.0.0/ch07.pdf and look for
"Spacing Clones of Diacritics".

The ASCII tilde was originally intended to be a non-spacing diacritic
tilde to be applied to other characters by backspace. In part because of
the low resolution of many early data-processing printers it was often
realized in a tilde operator form. That has now become its most normal
form in fonts.

But for good typography you do want to distinguish them and the
overloading of tilde as ASCII 7E means that a font may render a
mathemtical full-character tilde when you want to show a diacritic or
render a spacing diacritic when you wanted a mathematical operator.

Unicode is intended for typesetting applications as well as entering
computer code in a traditional typewriter style character set with
typewriter limitations.
Post by J***@aculab.com
and then there's
U+2223 (divides) - hell, that looks to me remarkably like U+007C
(vertical line)
The do look close. But U+007C usually extends below the base line and
and U+2223 usually doesn't.
Post by J***@aculab.com
U+2264 (less than or equal to) - compare with U+2A7D (less than or
slanted equal to)
I have no idea. You will probably have to ask the MathML people about
that one. See http://www.w3.org/TR/2001/REC-MathML2-20010221.
Mathematicians seem to think they need to distinguish the two.

As a non-mathematician I find many of these distinctions bewildering and
seemingly only typographical. But if mathematicians in some field make
fine distinctions based on such differences then it is important that
Unicode allow such distinctions to be maintained in plain text.
Post by J***@aculab.com
In defence of this argument, I point out that the
complementary relation, NOT equal to, has codepoint U+2270, and this is
represented in the code charts as having a slanted equal to, so it OUGHT to
be the complement of U+2A7D. (Unless I've missed it, there appears to be no
"not equal to with horizontal equals" character).
The chart at http://www.unicode.org/charts/PDF/U2200.pdf does not show a
slanted equals.

For some discussion of the math symbols see also
http://www.unicode.org/unicode/reports/tr25/tr25-5.html.

Part of the problem is that differences that are in most environments
only typographical style differences may indicate semantic differences
in particular disciplines. It is impossible to establish a firm line as
to how important or common would would normally be a stylistic variation
must be before it should be encoded in Unicode for plain text distinctions.

For example open-loop _g_ is distinguished from close-loop _g_ in the
International Phonetic Alphabet and so Unicode encodes it separately at
U+0261.

A normal Latin Letter font would probably not have U+0261 in it at all
and might display U+0067 with either closed or open loop. But a font for
phonetic use should always display U+0067 with a closed loop.

Fonts like Arial Unicode MS lose the distinction.

For non-technical use people need not and mostly quite rightly will not
use the more technical symbols to make fine distinctions that don't
apply in their particular usage.

Jim Allan











------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Peter Kirk
2003-08-18 16:43:53 UTC
Permalink
Post by Jim Allan
Post by J***@aculab.com
I would really like it if these, and
every single other character which is "only there for reasons of round trip
compatibility" with something else, were explicity marked in the
machine-readable charts with something meaning "Don't introduce this
character, at all, ever. Don't try to interpret it. Just preserve it, in
case it ever gets turned back to its original character set".
That would probably be too strong.
If characters are available then some people will use them. :-(
See section 2.3 at http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf
<< Their use is discouraged other than for legacy data. >>
I don't know whether this statement was intentionally removed was
accidently dropped in the changes in 4.0 which distinguish
"compatitiblity character" from "compatibility composite character".
In any case people can't be prevent from doing things that are
officially discouraged, especially as for some particular use it might
be wrong to discourage them. So if you are handling Roman numerals in
an application and wish your handling to be complete then
unfortunately you do have to take the compatibility Roman numerals
into account.
Yes, but people can be clearly discouraged from using them, and that is
not currently happening. It seems that currently if you come across a
character by browsing through the charts and want to discover if use of
it is officially discouraged you have to wade through huge databases and
hundreds of pages of text to find out if a particular set of properties
implies that use is discouraged. Well, even that won't tell me
definitively, for I read, "The compatibility decomposable characters are
precisely defined in the Unicode Character Database, whereas the
compatibility characters in the more inclusive sense are not." (from
section 2.3) - and it is the latter whose use is discouraged. But is it
in fact safe to assume that the list of such characters includes, but is
not limited to, those which have defined compatibility mappings?

It would be much simpler if each such character were clearly labelled in
the code charts etc. DO NOT USE!, and with its glyph presented on a grey
background or in some other way to indicate its special status.
--
Peter Kirk
***@qaya.org (personal)
***@qaya.org (work)
http://www.qaya.org/




To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Jim Allan
2003-08-18 18:32:50 UTC
Permalink
Post by Peter Kirk
It would be much simpler if each such character were clearly labelled in
the code charts etc. DO NOT USE!, and with its glyph presented on a grey
background or in some other way to indicate its special status.
I don't think people should be told so directly to NOT use an official
Unicode character unless the character is actually deprecated.

Over the years recommendations about particular characters in the
standard have sometimes changed and no-one can see all possible uses for
characters or all ways that applications might use some of them.

But greying the chart area for deprecated characters and singleton
canonical decomposable characters seems to me a good idea.

As to compatibility characters, remember some of them, for example
spaces with varying widths, make essential differences in formatting.
The standard warns applications not to be hasty in unifyng compatibility
characters for presentation.

If it is not deprecated a character should be usable.

But some more obivous graphic indication would be nice to more obviously
indicate that perhaps a user should think carefully about using that
particular encoded character.

Jim Allan
















------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Peter Kirk
2003-08-18 19:12:38 UTC
Permalink
Post by Jim Allan
Post by Peter Kirk
It would be much simpler if each such character were clearly labelled in
the code charts etc. DO NOT USE!, and with its glyph presented on a grey
background or in some other way to indicate its special status.
I don't think people should be told so directly to NOT use an official
Unicode character unless the character is actually deprecated.
OK, DO NOT USE! is too strong, but something like NOT RECOMMENDED! could
be used instead.
Post by Jim Allan
Over the years recommendations about particular characters in the
standard have sometimes changed and no-one can see all possible uses
for characters or all ways that applications might use some of them.
Well, such things need not be frozen from version to version. And a note
could read NOT RECOMMENDED except in the case of...
Post by Jim Allan
But greying the chart area for deprecated characters and singleton
canonical decomposable characters seems to me a good idea.
As to compatibility characters, remember some of them, for example
spaces with varying widths, make essential differences in formatting.
The standard warns applications not to be hasty in unifyng
compatibility characters for presentation.
Well, that's what was puzzling me about the recommendations not to use
these characters. In my opinion, there needs to be a clear statement
with each character definition (not somewhere in the text not linked to
it) of its status in such respects. Is it for compatibility use only? Is
it a presentation form not for use in general information interchange?
Is it a formatting variant of another character, which should be used if
that special formatting is to be indicated although the two might be
collated together?

For example, if I want a superscript 2 to indicate "squared" (which
someone used on this list recently), am I supposed to use U+00B2, or
should I avoid using it and instead use a higher level markup (which
implies I need to use HTML e-mail)? Maybe the text tells me somewhere,
but it certainly doesn't in the code chart.
Post by Jim Allan
If it is not deprecated a character should be usable.
I thought even deprecated ones were supposed to be usable, in that a
system should process them correctly.
Post by Jim Allan
But some more obivous graphic indication would be nice to more
obviously indicate that perhaps a user should think carefully about
using that particular encoded character.
Agreed.
Post by Jim Allan
Jim Allan
--
Peter Kirk
***@qaya.org (personal)
***@qaya.org (work)
http://www.qaya.org/




------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Rick McGowan
2003-08-18 19:07:06 UTC
Permalink
Someone suggested...
Post by Peter Kirk
It would be much simpler if each such character were clearly labelled in
the code charts etc. DO NOT USE!, and with its glyph presented on a grey
background or in some other way to indicate its special status.
Well, sure, I agree that it might be nice to somewhere document some of
the discouraged and deprecatd characters in a way that people could find
easily, putting gray boxes in the charts isn't the way.

Perhaps we should also put in blinking bold neon letters the disclaimer
Post by Peter Kirk
Disclaimer
These charts are provided as the on-line reference to the character
contents of the Unicode Standard, Version 4.0 but do not provide all
the information needed to fully support individual scripts using the
Unicode Standard. For a complete understanding of the use of the
characters contained in this excerpt file, please consult the
appropriate sections of The Unicode Standard, Version 4.0
(ISBN 0-321-18578-1), as well as Unicode Standard Annexes #9,
#11, #14, #15, #24 and #29, the other Unicode Technical Reports
and the Unicode Character Database, which are available on-line.
Before using things in the standard, people really should check out what
they are using! There are lοts of things that look really similar but have
wildly different semantics and оne might n০t want t੦ use things
indiscriminantly based s๐lely ᅌn what's in the charts...

Rick



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Jim Allan
2003-08-18 22:41:08 UTC
Permalink
Post by Peter Kirk
Well, that's what was puzzling me about the recommendations not to use
these characters. In my opinion, there needs to be a clear statement
with each character definition (not somewhere in the text not linked to
it) of its status in such respects. Is it for compatibility use only? Is
it a presentation form not for use in general information interchange?
Is it a formatting variant of another character, which should be used if
that special formatting is to be indicated although the two might be
collated together?
Perhaps a cross-reference to areas in the main text where that
particular character or kind of character is discussed when there is
some special mention in the main text.

Otherwise the various indications of distinction and compabitility
decomposition and canonical decomposition usually indicate a lot, if
the reader looks at them and learns to understand them.

But indeed the standard is somewhat inconsistant in sometimes coming
close to recommending not using compatibility characters at all and in
other cases recommending particular ones.
Post by Peter Kirk
For example, if I want a superscript 2 to indicate "squared" (which
someone used on this list recently), am I supposed to use U+00B2, or
should I avoid using it and instead use a higher level markup (which
implies I need to use HTML e-mail)? Maybe the text tells me somewhere,
but it certainly doesn't in the code chart.
Well if you are using unformatted text and want to use a superscript 2
then you don't have much choice. I suppose I could have sent "E=mc^2" or
"E=mc{squared}" "E=mc<super>2" or something, but why would I when I have
Unicode? :-)

Actually superscript 2 is also in the Latin-1 character set. :-)

In http://www.unicode.org/versions/Unicode4.0.0/ch14.pdf it states:

<< Therefore, the preferred means to encode superscripted letters or
digits, such as “1st” or “DC0016”, is by style or markup in rich text. >>

I would think that statement obvious since in technical writing and
mathematical writing it is theoretically possible for any displayable
character in Unicode to be superscripted or subscripted, and even
superscripted or subscripted to an already superscript or subscript
character, and so on.

Also in the code chart (http://www.unicode.org/charts/PDF/U0080.pdf)
U+00BS SUPERSCRIPT TWO is given a compatibility decomposition to
"<super> *0032* 2". Similarly with other superscript characters.

But beyond all recommendations in the Unicode standard what is done
depends on what the user wants to do for a particular purpose in a
particular environment with particular fonts. There is no one correct
way that fits all users at all places and times, nor should there be.

If I am printing out a document on a particular system with particular
software and fonts in which plain text superscripts look to me better
than superscripts created by formatting regular numbers by the word
processor I am using then I will naturally in that time and place use
Unicode plain text superscripts.

That Unicode gives me the choice is a benefit I should take advantage of
without worrying that formatting regular numbers as superscript is
theoretically better than using compatibility characters.

Unicode is messy and complex mostly because character usage is messy and
complex and display technology is messy and complex and there are always
edge-cases and things that don't fit well.

But Unicode's keeping deprecated individual character encodings while
allowing applications to freely throw away non-deprecated canonical
decomposable encodings (which supposedly only exist because they should
not be thrown away) confuses me also.
Post by Peter Kirk
I thought even deprecated ones were supposed to be usable, in that a
system should process them correctly.
It depends on what is meant by "usable" and the "system" and
"correctly". No system has to support all of Unicode. Accordingly I
would not expect systems to support deprecated control characters or
fonts to go out of their way to support deprecated characters.

A system that does not support deprecated control codes (and even some
of the non-deprecatated control codes) and does not support particular
characters (perhaps only because there are no fonts on the system with
those characters) can still be conformant to Unicode in what it supports.

A text editor that supports only fixed width fonts will probably not
support the special-width spaces properly but may still be Unicode
conformant.

Jim Allan



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
J***@aculab.com
2003-08-19 08:58:58 UTC
Permalink
I disagree.

A post-Windows, post-Linux, Operating System for the 21st century intended
for global use, should ideally support the whole of Unicode.

There are, in fact, people working on such projects.
Jill


-----Original Message-----
From: Jim Allan [mailto:***@smrtytrek.com]
Sent: Monday, August 18, 2003 11:41 PM
To: ***@unicode.org
Subject: Re: Clones (was RE: Hexadecimal)


No system has to support all of Unicode.



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Peter Kirk
2003-08-19 09:30:10 UTC
Permalink
Post by J***@aculab.com
I disagree.
A post-Windows, post-Linux, Operating System for the 21st century intended
for global use, should ideally support the whole of Unicode.
There are, in fact, people working on such projects.
Jill
Well, whatever might be new about this OS, it is not its Unicode
support. Windows XP and Linux already support the whole of Unicode, more
or less.
--
Peter Kirk
***@qaya.org (personal)
***@qaya.org (work)
http://www.qaya.org/




------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
J***@aculab.com
2003-08-19 09:00:02 UTC
Permalink
Yeah, I know. But like I said, who uses this?

I have a QWERTY keyboard in front of me. I use a standard en-GB key mapping.
Now I _could_ customise my keymap such that Right-Alt + HYPHEN MINUS yielded
MINUS SIGN. Wouldn't that be great? Then I could write things like "x = -5;"
unambiguously. But it would completely screw my C++ compiler.

And I also have to ask ... if I am actually WRITING a C++ compiler, should I
allow the use of MINUS SIGN to mean minus sign? (Actually, that question may
be answered by the specification of C++, so let's push it a bit further. If
I am inventing some successor language to C++, and am free to invent my own
specification, should I _then_ allow the use of MINUS SIGN?)

I'm not being Devil's advocate. I don't necessarily even expect anyone to
have a definitive answer. I only ask that the charts make clear what each
character is FOR, in sufficient detail that the answer to questions like the
above becomes obvious.

Jill

-----Original Message-----
From: John Cowan [mailto:***@reutershealth.com]
Sent: Monday, August 18, 2003 4:39 PM
To: ***@Aculab.com
Cc: ***@unicode.org
Subject: Re: Clones (was RE: Hexadecimal)
Post by J***@aculab.com
U+2212 (minus sign) - an obvious clone of U+002D (hyphen-minus). Who
uses this?
The ASCII characters, because they have had to do double or triple
duty over the years when we had a very limited 7-bit character set,
often have several near-equivalents in Unicode that disambiguate their
*typographically* different purposes. Thus hyphen, minus sign, en dash,
and em dash have separate Unicode representations, though in ASCII they
are often written -, -, -- or -, and --- or -- respectively.


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
J***@aculab.com
2003-08-19 09:32:49 UTC
Permalink
Well that just proves my point then.
There are indeed some things that DO need to support the whole of Unicode
(more or less).

Jill


-----Original Message-----
From: Peter Kirk [mailto:***@ntlworld.com]
Sent: Tuesday, August 19, 2003 10:30 AM
To: ***@Aculab.com
Cc: ***@unicode.org
Subject: Re: Clones (was RE: Hexadecimal)
Post by J***@aculab.com
I disagree.
A post-Windows, post-Linux, Operating System for the 21st century intended
for global use, should ideally support the whole of Unicode.
There are, in fact, people working on such projects.
Jill
Well, whatever might be new about this OS, it is not its Unicode
support. Windows XP and Linux already support the whole of Unicode, more
or less.
--
Peter Kirk
***@qaya.org (personal)
***@qaya.org (work)
http://www.qaya.org/



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Asmus Freytag
2003-08-19 12:57:25 UTC
Permalink
Compatibility characters:

The recommendations for compatibility characters are necessarily vague,
since their use in legacy data (and legacy environments) is strongly
dependent on what is (or was) customary in a given environment.

If a process merely warehouses text data (or parses only a very small
subset of characters for special purpose, such as an HTML parser) then
merely preserving legacy characters is often the best strategy. However,
take the opposite example, of a process that actually scans the text for
roman numerals. In that case, ignoring the compatibility characters would
be a mistake, since legacy data of the kind for which these compatibility
characters were added would *only* contain roman numerals in this form.
They would *not* use the ASCII characters.

Processes that modify legacy data for re-export to a legacy system
obviously need to be intimately familiar with the legacy conventions, in a
way that could not possibly be documented in the Unicode Standard in all
details for every character and every legacy system.

Documentation in the code charts:

I agree with several of the comments that "hiding" the information about
special characters in running text makes it unnecessarily difficult to work
with the information. On the other hand, not everything can be succinctly
expressed in machine readable tables (some characters have complicated
usages), and even annotations in the name list have limits. They are
definitely not the place for lengthier discussions.

For Unicode 4.0 we have attempted to improve the situation by systematically
extracting the line-breaking related information into UAX#14, which at
least allows task-focused access. Information about mathematical usage of
characters is now collected in one place in UTR#25, partially duplicating,
and partially extending the information in the text of the standard, but
providing a single place of access. Further improvements are possible.
Personally I'd be in favor of some icon in the character names list that
simply indicates that a character is more fully discussed elsewhere - that
would make the code charts more useful as an index into the description of
the characters.

Mathematical operators:

Future extensions of programming languages should allow not only the MINUS
sign as operator, but many other charactesr, for example LOGICAL AND and
LOGICAL OR, and as many other operators as appropriate for the language.

Input of the operators doesn't have to necessarily be done via a special
purpose keyboard. The use of input macros, editor substitution or similar
input technologies (e.g. turning && into LOGICAL AND) would be more
flexible. Some editors already support the display of highly formatted
program source code even though the underlying text backbone uses the
standard ASCII conventions of current programming languages. Just one
example is Source Insight from www.sourceinsight.com, which not only
represents >= etc. by singly symbols, but can also correctly increase the
size of outer parentheses for nested expressions.

A./



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Jim Allan
2003-08-19 15:18:47 UTC
Permalink
Post by J***@aculab.com
Yeah, I know. But like I said, who uses this?
Books are normally produced today using computer typesetting. Look in
any mathematics text or any well printed book for minus signs. Hyphens
and minus signs are distinct (except when showing computer programming
in a non-spacing font). Hyphen and minus sign have always been different
characters.

TeX and SGML and other pre-Unicode legacy typographical systems support
this difference which has always existed.

On common computer systems like the Macintosh and Windows which didn't
support the difference globally in their standard character sets in
pre-Unicode days it was customary to use the en-dash instead of a minus
sign in formatted text. Or you switched to special math-symbol fonts
when entering mathematical signs and other symbols.

Style sheets and books of tips for word processing and desktop
publishing almost always go into some detail about the various kinds of
dashes and the minus sign. So does the Unicode manual in its section on
punctuation.
Post by J***@aculab.com
And I also have to ask ... if I am actually WRITING a C++ compiler, should I
allow the use of MINUS SIGN to mean minus sign? (Actually, that question may
be answered by the specification of C++, so let's push it a bit further. If
I am inventing some successor language to C++, and am free to invent my own
specification, should I _then_ allow the use of MINUS SIGN?)
The symbols to be used for any computer language are part of the
definition of that computer language. Currently you can't legally use
U+2212 for any computer language I know of.

However I will be surprised if computer languages do not start to take
advantage of the additional characters that are universally available
though Unicode.
Post by J***@aculab.com
I only ask that the charts make clear what each
character is FOR, in sufficient detail that the answer to questions like the
above becomes obvious.
Currently the manual assumes that a user who wants to use a character
will mostly already know what it is FOR or the user wouldn't want to use
it. That's a reasonable assumption to make to avoid expanding the manual
to five or six volumes at least. A small amount of typographical and
usage information on some characters is provided for the convenience of
font makers.

I would personally love to see an expanded version of the Unicode
manual, a sort of multi-volume encylopedia of characters and their
history and uses.

Meanwhile Unicode tells us that a particular glyph is a normal glyph for
MINUS SIGN. That really should be enough. Most people know that math
symbols are generally not (yet?) implemented to actually DO their
function on computers. And it is hardly necessary of the purpose of the
manul that, for examples, under % we should be told about its use for
modulus or introducing a comment in some computer languages.

You don't complain that the charts doen't tell you what U+00D7
MULTIPLICATION SIGN is for or U+00F7 DIVISION SIGN or U+0026 AMPERSAND.

As to supporting all of Unicode, see 2.12 in
http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf.

Must a cell phone, for example, support all of Unicode?

Must every font contain every Unicode character?

Partial support is quite conformant provided that what is supported is
supported according to the standard and data is not corrupted.

That doesn't mean that full support and impecable rendering is not
desireable. It is in the long run. But a lap top user who generally uses
only English may not wish have disk space taken up by East Asian fonts
or top-of-the line publishing software that handles east Indian scripts
impeccably.

Government software for various governments may purposely support only a
particular subset of the Unicode character set.

Jim Allan




------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
John Jenkins
2003-08-19 16:53:31 UTC
Permalink
Post by Jim Allan
Must every font contain every Unicode character?
FWIW, it's no longer possible for a TrueType/OpenType font to contain
every Unicode character with a distinct glyph. (Apple's LastResort
font does it, of course, but by virtually of rampant reuse of glyphs.)

========
John H. Jenkins
***@apple.com
***@mac.com
http://homepage..mac.com/jhjenkins/



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
James H. Cloos Jr.
2003-08-19 19:04:35 UTC
Permalink
John> (Apple's LastResort font [contains every Unicode character],
John> of course, but by virtually of rampant reuse of glyphs.)

Does this Generate glyphs like the following ascii- & utf8-art?

+--+ ┌──┐
|AB| │AB│
|CD| │CD│
+--+ └──┘

(Both included for the benefit of the utf8-impaired.)

I find it interesting, if so, that Apple uses a font to acheive that
rather than a bit of code in the rendering libs. I beleive that
pango (Παν語) does it in the lib.

-JimC



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Michael Everson
2003-08-19 19:45:39 UTC
Permalink
Post by James H. Cloos Jr.
John> (Apple's LastResort font [contains every Unicode character],
John> of course, but by virtually of rampant reuse of glyphs.)
Does this Generate glyphs like the following ascii- & utf8-art?
No. It generates much much better glyphs than that. See
http://developer.apple.com/fonts/LastResortFont/
Post by James H. Cloos Jr.
I find it interesting, if so, that Apple uses a font to acheive that
rather than a bit of code in the rendering libs.
What Mac OS X does is when it encounters a Unicode character, it sees
if it's in the current font. If it's not, it starts looking through
all the other fonts until it finds one that is suitable. The Last
Resort Font has glyphs for all the characters, so it's the last one
looked at.
--
Michael Everson * * Everson Typography * * http://www.evertype.com


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Owen Taylor
2003-08-19 20:24:30 UTC
Permalink
Post by Michael Everson
Post by James H. Cloos Jr.
John> (Apple's LastResort font [contains every Unicode character],
John> of course, but by virtually of rampant reuse of glyphs.)
Does this Generate glyphs like the following ascii- & utf8-art?
No. It generates much much better glyphs than that. See
http://developer.apple.com/fonts/LastResortFont/
Of course, "better" here really depends on what you want.
Prettier? Yes. More useful for Joe User who gets Sinhala
spam? Yes. More useful if you are trying to debug why, in
a span of Arabic text, some characters aren't being located
in a font? Not really.
Post by Michael Everson
Post by James H. Cloos Jr.
I find it interesting, if so, that Apple uses a font to acheive that
rather than a bit of code in the rendering libs.
What Mac OS X does is when it encounters a Unicode character, it sees
if it's in the current font. If it's not, it starts looking through
all the other fonts until it finds one that is suitable. The Last
Resort Font has glyphs for all the characters, so it's the last one
looked at.
If you have a Last Resort style font, Pango should pick it up
as well (*). The hex boxes are only drawn when *no* font
on the system contains the character.

Regards,
Owen

(*) With some caveats about fontconfig configuration that I'm
not going to get into here.




------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Michael Everson
2003-08-19 21:08:10 UTC
Permalink
Post by Michael Everson
No. It generates much much better glyphs than that. See
http://developer.apple.com/fonts/LastResortFont/
Of course, "better" here really depends on what you want. Prettier? Yes.
Thanks. :-)
More useful for Joe User who gets Sinhala spam? Yes.
Exactly.
More useful if you are trying to debug why, in a span of Arabic
text, some characters aren't being located in a font? Not really.
The glyph (if you look at it zoomed in enough) tells you the block
the character is encoded in. It doesn't tell you WHICH character
isn't in any usable font.
If you have a Last Resort style font, Pango should pick it up as well.
I don't know what Pango is but I guess it isn't relevant to me...
--
Michael Everson * * Everson Typography * * http://www.evertype.com


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Owen Taylor
2003-08-19 21:43:51 UTC
Permalink
Post by Michael Everson
If you have a Last Resort style font, Pango should pick it up as well.
I don't know what Pango is but I guess it isn't relevant to me...
It was mentioned in the mail that you replied to (because of
it's hex-box-drawing) so I didn't feel a need to gloss.

Pango is an text layout library roughly along the lines of
Uniscribe/ATSUI/etc, developed largely by myself, with
lots of help from the open-source community, including
various people on this list.

See http://www.pango.org for really outdated content. (Not
much time to update the web page these days.)

If you don't use Linux or Unix, it's likely not relevant to
you. It's used pretty widely these days in that arena.

Regards,
Owen




------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Gerd Schumacher
2003-08-20 02:31:26 UTC
Permalink
Post by Jim Allan
Post by J***@aculab.com
Yeah, I know. But like I said, who uses this?
Books are normally produced today using computer typesetting. Look in
any mathematics text or any well printed book for minus signs. Hyphens
and minus signs are distinct (except when showing computer programming
in a non-spacing font). Hyphen and minus sign have always been different
characters.
There are some more appeareances of the hypen, which depend on the
fontdesign:

1. Double horizontal stroke, fairly similar to the equal sign, but not the
same.
2. Slanted single stroke, for example in Renaissance printing.
3. Double slanted stroke, as well as always used in Fraktur (blackletters).

None of them can be used as a minus sign, and replacing them by a minus
sign would disturb many fonts' appeareance.

Gerd



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Jim Allan
2003-08-19 20:51:19 UTC
Permalink
Post by Michael Everson
The Last
Resort Font has glyphs for all the characters, so it's the last one
looked at.
I hope that it is not just for that reason that it is the last one
looked at.

Jim Allan



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Michael Everson
2003-08-19 21:05:20 UTC
Permalink
The Last Resort Font has glyphs for all the characters, so it's the last one
looked at.
I hope that it is not just for that reason that it is the last one looked at.
Eh? The system looks for Unicode glyphs in all the other fonts and if
there's no available glyph the the LRF is displayed. It's the "last
resort".

There are, of course, some lovely easter-eggs in the font....
--
Michael Everson * * Everson Typography * * http://www.evertype.com


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
John Cowan
2003-08-19 21:26:17 UTC
Permalink
Post by Michael Everson
No. It generates much much better glyphs than that. See
http://developer.apple.com/fonts/LastResortFont/
Out of mild curiosity: (a) what font did you use to create the legends
in the frame of each glyph; (b) are all the various representative glyphs
drawn from a common font, and if so, which one?

Defect reports based on the 236-page PDF:

p. 17 (Thaana): the bottom of the frame reads "THAAN" with a broken glyph
following.

p. 18 (Phoenician): the top of the frame reads "OENECIAN", the bottom is
blank, and the rest of the glyph is black.

p. 55 (Pahawh): bottom of frame is blank.

p. 63 (Syloti Nagri): both top and bottom read "SILOTI NAGRI".

p. 171 (Brahmi): both top and bottom read "BRAMHI".
--
One Word to write them all, John Cowan <***@reutershealth.com>
One Access to find them, http://www.reutershealth.com
One Excel to count them all, http://www.ccil.org/~cowan
And thus to Windows bind them. --Mike Champion


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Michael Everson
2003-08-19 21:52:55 UTC
Permalink
Post by John Cowan
Post by Michael Everson
No. It generates much much better glyphs than that. See
http://developer.apple.com/fonts/LastResortFont/
Out of mild curiosity: (a) what font did you use to create the legends
in the frame of each glyph;
Chicago.
Post by John Cowan
(b) are all the various representative glyphs
drawn from a common font, and if so, which one?
Of course they were not. No such font exists.
Post by John Cowan
p. 17 (Thaana): the bottom of the frame reads "THAAN" with a broken glyph
following.
p. 18 (Phoenician): the top of the frame reads "OENECIAN", the bottom is
blank, and the rest of the glyph is black.
p. 55 (Pahawh): bottom of frame is blank.
p. 63 (Syloti Nagri): both top and bottom read "SILOTI NAGRI".
p. 171 (Brahmi): both top and bottom read "BRAMHI".
I will look into all of that, and thank you for it; but note that of
those only Thaana can be expected to display, as none of the others
have been encoded. So none of those could EVER be displayed; they are
just extra glyphs in the current font.
--
Michael Everson * * Everson Typography * * http://www.evertype.com


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
P***@sil.org
2003-09-02 13:48:03 UTC
Permalink
Post by Michael Everson
Post by John Cowan
p. 63 (Syloti Nagri): both top and bottom read "SILOTI NAGRI".
I will look into all of that, and thank you for it; but note that of
those only Thaana can be expected to display, as none of the others
have been encoded. So none of those could EVER be displayed; they are
just extra glyphs in the current font.
Syloti Nagri has been approved by UTC and assigned to A800..A82F, though
this is yet to be ratified by WG2 (presumably will happen in October) and
published in a new version of Unicode (will be 4.1) or an amendment to ISO
10646 (I don't know what timetable is in place for publishing further
amendments).



Peter Constable


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Gerd Schumacher
2003-08-22 20:06:36 UTC
Permalink
On Roman number signs


Jill Ramonski scripsit;
Post by J***@aculab.com
I confess, I hadn't read ch14.pdf, and I probably should have done. My
fault. But I still believe that there should be something in the
machine-readable code charts themselves that says, of the Roman numerals,
"Don't use these characters - use the the normal Latin letters instead". If
they really are there _SOLELY_ for round trip compliance with East Asian
standards, then, if I wish to put the year MMIII in a web page, I should
_NOT_ use the Roman letters. Furthermore, if I write software to interpret
Roman Numbers, I only need to interpret the Basic Latin letters, not the
Roman ones. My life as a webmaster and programmer is made so much SIMPLER by
not having to use the Roman letters. I would really like it if these, and
every single other character which is "only there for reasons of round
...
In - I think, not only - German quality printing the Roman numerals and the
related letters usually are not equal. At least the numerals got a reduced
advanced width. Metal fonts usually had no extra Roman numerals punches,
but the typesetters filed the punches a bit slimmer. The I, the V, and the X

may also have connecting top- and bottom bars, the latters not necessarily
at the base line. So you cannot say, they were simply cloned letters.

Ok, this might be a matter of smart font technologies, hopefully available
one
day in standard PC applications, but as there are code points defined for
these
numerals, they are and certainly will be used in Latin script for a well
understandable reason. Is there another solution for non smart fonts?

In my opinion the advice, not to use these codepoints will not solve the
problem. Actually there are fonts, containing very clearly distinct Roman
numerals, for example the Titus Cyberbit font of the Titus project at the
Frankfurt (Main) university.

Gerd Schumacher




------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Loading...