What things are called (was Non-ascii string processing)

Discussion:

Jill Ramonsky

2003-10-07 10:20:09 UTC

Sigh! Things were a lot easier back in the old days of Unicode version
3, when default grapheme clusters were still called "glyphs". Okay, so
the general public still got it wrong, but that was just because they
were ignorant monkeys who didn't know any better, and it was up to the
likes of us to teach them the right words for things. :-) Now, instead,
we'll have to teach them to say "default grapheme cluster". How long do
you think it will be before it will be acceptable to describe a console
or terminal emultator as being "80 default grapheme clusters wide and 25
default grapheme clusters high"? If I had to guess, I'd say ... never.

Of course, a default grapheme cluster is exactly what Johann was trying
to represent in 64 bits in his Excessive Memory Usage Encoding. It's
unfortunate that 64 bits just isn't enough for this purpose.

It would be a whole lot easier if Unicode types would only use the same
words for things as the rest of the world. I suggest:
(1) A codepoint is still called a codepoint. No problem there.
(2) The object currently called a "character" be renamed as something
like "mapped codepoint" or "encoded codepoint", or possibly (coming in
from the other end) something like "sub-character" or "character
component" or "characterette" (which can be shortened to "charette" and
pronounced "carrot". :-) )
(3) The object currently called a "default grapheme cluster" be renamed
as "character".
(4) The object currently called a "tailored grapheme cluster" be renamed
as "tailored character"

This would make even /our/ conversations a lot less confusing.
Jill

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Jill Ramonsky

2003-10-07 11:35:28 UTC

Permalink

I have invented a new system, Unilib, for organising books in a library.

... Except that you're not allowed to call them "books" any more,
because I've already redefined the word "book" to mean "the physical
expression of a catalogue entry". Since what the user normally
experiences as a book may actually require several catalogue entries, we
can no longer use the word "book" for this object. Consequently, we need
a new word or phrase to describe what the user normally experiences as a
book. We tried calling them "volumes" back in Unilib 3.0, but it turned
out that that word was also used for something else. So now we call them
"default chapter clusters".

Hey - the public will just have to get used to it!

:-)

Jill

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Doug Ewell

2003-10-07 15:42:20 UTC

Permalink

Jill Ramonsky <Jill dot Ramonsky at Aculab dot com> wrote... Well, one

Post by Jill Ramonsky
:-)

OK, that's out of the way. What follows is not necessarily 100%
serious.

Post by Jill Ramonsky
I have invented a new system, Unilib, for organising books in a library.
... Except that you're not allowed to call them "books" any more,
because I've already redefined the word "book" to mean "the physical
expression of a catalogue entry". Since what the user normally
experiences as a book may actually require several catalogue entries,
we can no longer use the word "book" for this object. Consequently,
we need a new word or phrase to describe what the user normally
experiences as a book. We tried calling them "volumes" back in Unilib
3.0, but it turned out that that word was also used for something
else. So now we call them "default chapter clusters".

Actually, this is a great analogy to what is going on with Unicode
terminology, but probably not for the reason Jill had in mind.

There are plenty of examples of "books" as the user sees them that
contain one or more "books" as the author sees them. The Old and New
Testaments, and similar scriptural and philosophical material in many
belief systems, consist of many "books" that are bound together within a
hard cover. The Book of Genesis would be an awfully thin "book" if it
appeared on the shelf individually. Likewise, many great (and
not-so-great) literary works have been divided into "Book I" and "Book
II" by their authors.

This overloading of the word "book" can indeed lead to confusion and
misunderstanding, as when a high-school student with an assignment to
read and compare two books chooses "Book I" and "Book II" of the same
jointly bound work. When the Springfield Public Library takes an
inventory, they will probably continue to count each copy of the Bible
as one book, not as dozens.

My point is that Jill's Unilib didn't invent this confusion and
ambiguity.

Likewise, any character encoding standard that incorporates the concept
of "combining characters" is bound to experience the same sort of
confusion and ambiguity over the term "character." This is not unique
to Unicode; ISO 6937 has this problem as well with its (leading)
non-spacing marks. In ISO 6937, <0x61> is <a>, while <0xC2 0x61> is
<á>. Are both the one-byte and two-byte sequences "characters"? Does
that mean 0x61 is both a character in its own right and *part* of
another character? Do we need a separate word for whatever 0x61
represents?

Unicode greatly expanded the potential for this sort of complication, by
encoding all the lexical symbols (or whatever) of almost all modern
scripts and many archaic ones, and introducing many more types of
combining marks and interactions between them than any previous
character encoding. Unicode has also tried to reduce the confusion, by
introducing new terms. Sometimes the terms add confusion here as they
take it away there, but our only real alternative is to go back to the
days when we couldn't really talk about these things because they had no
name.

:-)

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Peter Kirk

2003-10-07 17:40:21 UTC

Permalink

Post by Doug Ewell
...
The Book of Genesis would be an awfully thin "book" if it
appeared on the shelf individually. ...

Not that thin, actually - 85 pages in my Hebrew Bible. But some of the
"books", e.g. Obadiah and 2 and 3 John, fit easily on one page. So your
point stands.

Post by Doug Ewell
Likewise, many great (and
not-so-great) literary works have been divided into "Book I" and "Book
II" by their authors.

This was I think based on the custom in classical times when a "book"
had a fixed maximum size rather smaller than it is today, based on the
size of a scroll or whatever, and so authors were forced to divide
their works into separate books. Of course many authors still do it even
though we now have printed books large enough. Well, actually books have
been large enough at least since the 4th century CE when the first one
volume copies of the full Greek Bible were produced. Three of these
4th-5th century copies survive, two of them in the British Library.

Post by Doug Ewell
This overloading of the word "book" can indeed lead to confusion and
misunderstanding, as when a high-school student with an assignment to
read and compare two books chooses "Book I" and "Book II" of the same
jointly bound work. When the Springfield Public Library takes an
inventory, they will probably continue to count each copy of the Bible
as one book, not as dozens.

Then there is also the confusion of whether a multi-volume work counts
as one book or several. How many entries in the inventory for a ten
volume encyclopedia? Ten or one? What if one volume is missing? What of
a supposedly multi-volume work whose volumes are published at wide
intervals? Some Bible commentary series are presented as multi-volume
works but volumes have been published in an arbitrary order, by various
authors, and sometimes replaced one at a time, in extreme cases for as
long as a century (the International Critical Commentary series). So the
concept of "book" becomes even more slippery than the concept of
"character".
--
Peter Kirk
***@qaya.org (personal)
***@qaya.org (work)
http://www.qaya.org/

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

j***@spin.ie

2003-10-07 10:41:18 UTC

Permalink

Post by Jill Ramonsky
(2) The object currently called a "character" be renamed as something
like "mapped codepoint" or "encoded codepoint", or possibly
(coming in
from the other end) something like "sub-character" or "character
component" or "characterette" (which can be shortened to
"charette" and
pronounced "carrot". :-) )

charette would just get confused with caret :)

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Marco Cimarosti

2003-10-07 12:32:26 UTC

Permalink

Post by Jill Ramonsky
Hey - the public will just have to get used to it!

No, the public should not be bored with these technical details: in the user
manual, a "book" will still be a "book". The fact that, in the source code
of the application "book" means something else if of interest only to
programmers.

_ Marco

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Jill Ramonsky

2003-10-07 12:43:39 UTC

Permalink

Er, dude. It's called a sense of humor. Hence the smiley (which you
snipped).
Jill

-----Original Message-----
Sent: Tuesday, October 07, 2003 1:32 PM
Subject: RE: What things are called (was Non-ascii string processing)

Post by Jill Ramonsky
Hey - the public will just have to get used to it!

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Continue reading on narkive:

Search results for 'What things are called (was Non-ascii string processing)' (Questions and Answers)

replies

What is AIX Box?

started 2006-05-08 15:58:44 UTC

hardware

replies

what does UNIX mean?

started 2006-09-26 14:21:06 UTC

software

replies

How computers are able to take words and translate them?