Questions on Myanmar encoding

Discussion:

Eric Muller

2003-09-18 20:31:51 UTC

1. What is encoded by the sequence of characters

U+1004 င MYANMAR LETTER NGA
U+1039 ◌္ MYANMAR SIGN VIRAMA
U+1004 င MYANMAR LETTER NGA

is it kinzi + consonant NGA or consonant NGA+ subscript consonant NGA?
Should we add some words to Table 10.3 to clarify that?

2. Does consonant + subscript consonant NGA ever appear? If so, how is
it rendered? If not, should we remove U+1004 from the third row of Table
10.3?

3. About Table 10.3: it is true that *in the encoding model* a cluster
is always made of one element of each row, with row 2 (consonant)
mandatory and the other rows optional?

4. Is that model realistic, or are there some exceptions, that is real
life situations that it does not capture? Of cases where the encoding is
possible, but not intuitive (e.g. two clusters in the encoding instead
of one)?

5. Is is "correct" to view the kinzi as a medial form of NGA, which just
happens to be encoded at the front of the cluster? For what values of
"correct"?

6. Finally, I have tried to encode various strings I have seen in print
(or rather as pictures of printed stuff). I would really appreciate if
somebody could check my encodings. By the way, I found the introduction
to the Burmese script on that site very interesting. In particular, not
having to consider encoding made the presentation more accessible (i.e.
it provides the level of expertise needed to understand the "Composite
Characters" subhead in section 10.3).

Thanks,
Eric.

Loading Image...

U+1018 MYANMAR LETTER BHA
U+102C MYANMAR VOWEL SIGN AA
U+1015 MYANMAR LETTER PA
U+1039 MYANMAR SIGN VIRAMA
U+101B MYANMAR LETTER RA
U+1031 MYANMAR VOWEL SIGN E
U+102C MYANMAR VOWEL SIGN AA
U+1010 MYANMAR LETTER TA
U+102C MYANMAR VOWEL SIGN AA
U+101C MYANMAR LETTER LA
U+1032 MYANMAR VOWEL SIGN AI
U+0020 SPACE
U+1001 MYANMAR LETTER KHA
U+1004 MYANMAR LETTER NGA
U+1039 MYANMAR SIGN VIRAMA
U+200C ZERO WIDTH NON-JOINER
U+1017 MYANMAR LETTER BA
U+1039 MYANMAR SIGN VIRAMA
U+101A MYANMAR LETTER YA
U+102C MYANMAR VOWEL SIGN AA
U+1038 MYANMAR SIGN VISARGA

Loading Image...

U+1019 MYANMAR LETTER MA
U+102C MYANMAR VOWEL SIGN AA
U+1010 MYANMAR LETTER TA
U+102D MYANMAR VOWEL SIGN I
U+1000 MYANMAR LETTER KA
U+102C MYANMAR VOWEL SIGN AA

Loading Image...

U+1021 MYANMAR LETTER A
U+1013 MYANMAR LETTER DHA
U+1031 MYANMAR VOWEL SIGN E
U+101B MYANMAR LETTER RA
U+102D MYANMAR VOWEL SIGN I
U+1000 MYANMAR LETTER KA
U+1014 MYANMAR LETTER NA
U+1039 MYANMAR SIGN VIRAMA
U+200C ZERO WIDTH NON-JOINER
U+1012 MYANMAR LETTER DA
U+1031 MYANMAR VOWEL SIGN E
U+102C MYANMAR VOWEL SIGN AA
U+1039 MYANMAR SIGN VIRAMA
U+200C ZERO WIDTH NON-JOINER
U+101C MYANMAR LETTER LA
U+102C MYANMAR VOWEL SIGN AA
U+0020 SPACE
U+1042 MYANMAR DIGIT TWO
U+1048 MYANMAR DIGIT EIGHT
U+0020 SPACE
U+1042 MYANMAR DIGIT TWO
U+002C COMMA
U+1040 MYANMAR DIGIT ZERO
U+1040 MYANMAR DIGIT ZERO
U+1040 MYANMAR DIGIT ZERO
U+0020 SPACE
U+1000 MYANMAR LETTER KA
U+1030 MYANMAR VOWEL SIGN UU
U+100A MYANMAR LETTER NNYA
U+102E MYANMAR VOWEL SIGN II

Loading Image...

U+1021 MYANMAR LETTER A
U+1039 MYANMAR SIGN VIRAMA
U+101D MYANMAR LETTER WA
U+1014 MYANMAR LETTER NA
U+1039 MYANMAR SIGN VIRAMA
U+200C ZERO WIDTH NON-JOINER
U+101C MYANMAR LETTER LA
U+102F MYANMAR VOWEL SIGN U
U+102D MYANMAR VOWEL SIGN I
U+1004 MYANMAR LETTER NGA
U+1039 MYANMAR SIGN VIRAMA
U+200C ZERO WIDTH NON-JOINER
U+1038 MYANMAR SIGN VISARGA
U+1017 MYANMAR LETTER BA
U+102E MYANMAR VOWEL SIGN II
U+1007 MYANMAR LETTER JA
U+102C MYANMAR VOWEL SIGN AA
U+101C MYANMAR LETTER LA
U+1039 MYANMAR SIGN VIRAMA
U+101A MYANMAR LETTER YA
U+1039 MYANMAR SIGN VIRAMA
U+101F MYANMAR LETTER HA
U+1031 MYANMAR VOWEL SIGN E
U+102C MYANMAR VOWEL SIGN AA
U+1000 MYANMAR LETTER KA
U+1039 MYANMAR SIGN VIRAMA
U+200C ZERO WIDTH NON-JOINER
U+1014 MYANMAR LETTER NA
U+102F MYANMAR VOWEL SIGN U
U+102D MYANMAR VOWEL SIGN I
U+1004 MYANMAR LETTER NGA
U+1039 MYANMAR SIGN VIRAMA
U+200C ZERO WIDTH NON-JOINER

Loading Image...

U+1010 MYANMAR LETTER TA
U+102D MYANMAR VOWEL SIGN I
U+101B MYANMAR LETTER RA
U+1005 MYANMAR LETTER CA
U+1039 MYANMAR SIGN VIRAMA
U+1006 MYANMAR LETTER CHA
U+102C MYANMAR VOWEL SIGN AA
U+1014 MYANMAR LETTER NA
U+1039 MYANMAR SIGN VIRAMA
U+200C ZERO WIDTH NON-JOINER
U+1025 MYANMAR LETTER U
U+101A MYANMAR LETTER YA
U+1039 MYANMAR SIGN VIRAMA
U+101A MYANMAR LETTER YA
U+102C MYANMAR VOWEL SIGN AA
U+1025 MYANMAR LETTER U
U+1039 MYANMAR SIGN VIRAMA
U+200C ZERO WIDTH NON-JOINER
U+1018 MYANMAR LETTER BHA
U+102F MYANMAR VOWEL SIGN U
U+102D MYANMAR VOWEL SIGN I
U+1037 MYANMAR SIGN DOT BELOW
U+0020 SPACE
U+2018 LEFT SINGLE QUOTATION MARK
U+1015 MYANMAR LETTER PA
U+1004 MYANMAR LETTER NGA
U+1039 MYANMAR SIGN VIRAMA
U+200C ZERO WIDTH NON-JOINER
U+1012 MYANMAR LETTER DA
U+102C MYANMAR VOWEL SIGN AA
U+1014 MYANMAR LETTER NA
U+102E MYANMAR VOWEL SIGN II
U+2019 RIGHT SINGLE QUOTATION MARK
U+0020 SPACE
U+101B MYANMAR LETTER RA
U+1031 MYANMAR VOWEL SIGN E
U+102C MYANMAR VOWEL SIGN AA
U+1000 MYANMAR LETTER KA
U+1039 MYANMAR SIGN VIRAMA
U+200C ZERO WIDTH NON-JOINER
---

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Maung TunTunLwin

2003-09-20 01:15:28 UTC

Permalink

Hello Mr. Eric Muller,

I am new comer to Myanmar Unicode encoding. Although I don't satisfy enough
on current encoding but I will try to answer your question.

Post by Eric Muller
1. What is encoded by the sequence of characters
U+1004 င MYANMAR LETTER NGA
U+1039 ◌္ MYANMAR SIGN VIRAMA
U+1004 င MYANMAR LETTER NGA
is it kinzi + consonant NGA or consonant NGA+ subscript consonant NGA?
Should we add some words to Table 10.3 to clarify that?

It is always Kingzi and always true for Myanmar and Parli scripts. Character
Nga can not stay as subscript absolutely. Although, there is no word like
Nga over Nga it is absolutely legal to think and there will be one way to
think as Kingsi over Nga.

Post by Eric Muller
2. Does consonant + subscript consonant NGA ever appear? If so, how is
it rendered? If not, should we remove U+1004 from the third row of Table
10.3?

There will be no Nga as subscript consonant. If there have a chance to
render I think the best thing to show is illegal character, mean rectangle
box or similar. Or if you wish you can show as small Nga below and let user
decide what he does is wrong or not. That is developer's choice and I think
absolutely legal.
I don't know what you mean third row of Table 10.3. As I said I am new comer
please tell me where to look.

Post by Eric Muller
3. About Table 10.3: it is true that *in the encoding model* a cluster
is always made of one element of each row, with row 2 (consonant)
mandatory and the other rows optional?

I don't know exactly what it mean but I will explain our Pasint rule, may be
it will help you.

On our Pasint rule there can be two consonant form as cons+subscript. One
thing to understand is upper character is not truly Consonant of current
word. I use here word as single sounded character group. It is killer of
front word. Of cause on sorting it will go with front word and subscript
character will become true Consonant of current word.

So is that mean if there is no word concerting upper character is killer of
front(optional) word and lower character is Consonant(mandatory).

I think that will make big problem for you. If you still have some
difficulty try on me, I will try my best to explain.

Post by Eric Muller
4. Is that model realistic, or are there some exceptions, that is real
life situations that it does not capture? Of cases where the encoding is
possible, but not intuitive (e.g. two clusters in the encoding instead
of one)?

I agree that model still have some exceptions left. Especially on old Parli
usage and some word concerting. Without that I will say this model good for
glyph display but not good enough for user friendly and sorting.

Post by Eric Muller
5. Is is "correct" to view the kinzi as a medial form of NGA, which just
happens to be encoded at the front of the cluster? For what values of
"correct"?

No, absolutely wrong, as I explain Kingsi is killer of front word, and also
true for other character that have subscript.

Post by Eric Muller
6. Finally, I have tried to encode various strings I have seen in print
(or rather as pictures of printed stuff). I would really appreciate if
somebody could check my encodings. By the way, I found the introduction
to the Burmese script on that site very interesting. In particular, not
having to consider encoding made the presentation more accessible (i.e.
it provides the level of expertise needed to understand the "Composite
Characters" subhead in section 10.3).

With current model..
1018 102C 1039 101B 1031 102C 1010 102C 101C 1032 0020 1001 1004 1039 200C
1017 1039 101A 102C 1038 => "What you said?"

correct. But it used Space(0020), current rules said to use ZWSP. I also
agree to use Space because ZWSP don't show visually break especially when
not aligning. We need visually break to prevent miss reading.

1019 102C 1010 102D 1000 102C=>"Contents"

correct.

1021 1013 1031 101B 102D 1000 1014 1039 200C 1012 1031 102C 1039 200C 101C
102C 0020 1042 1048 0020 1042 002C 1040 1040 1040 0020 1000 1030 100A 102E=>
"US$28 2,00 ...?" I think help? 1000 1030 100A 102E 1015 102C

Just one character wrong 1031on third place should be 1012. And there should
be no space between 18 2,00.

1021 1039 101D 1014 1039 200C 101C 102F 102D 1004 1039 200C 1038 1017 102E
1007 102C 101C 1039 101A 1039 101F 1031 102C 1000 1039 200C 1014 102F 102D
1004 1039 200C=> "Can apply VISA with on line"

correct.

1010 102D 101B 1005 1039 1006 102C 1014 1039 200C 1025 101A 1039 101A 102C
1025 1039 200C 1018 102F 102D 1037 0020 2018 1015 1004 1039 200C 1012 102C
1014 102E 2019 0020 101B 1031 102C 1000 1039 200C => " 'PandaNi' for zoo..."

I don't know what red Panda mean but flow is correct just one big mistake
there
is no 1025 1039 200C LetterU Killer. The characters after 1021 to 102A can
not use as character of Killer.

It is 1009 1039 200C. Character 1009 NYA have two glyph. What you see on
Unicode is normal form glyph. Another form glyph is similar, you can say
same, with 1025 U
used only to pressed killer and Character, that have subscript, also another
kind of killer.

Nice to see you and I'm also wish to change some encoding rules but every
body said TOO LATE.

Bye...
Maung TunTunLwin
***@myanmar.com.mm

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Eric Muller

2003-09-24 16:20:48 UTC

Permalink

Thank you very much for your help.

Post by Maung TunTunLwin
I don't know what you mean third row of Table 10.3.

It is in Unicode 4.0, section 10.3, page 273, and you can see it at:
<http://www.unicode.org/versions/Unicode4.0.0/ch10.pdf#G24999>

Post by Maung TunTunLwin
With current model..
1018 102C 1039 101B 1031 102C 1010 102C 101C 1032 0020 1001 1004 1039 200C
1017 1039 101A 102C 1038 => "What you said?"
correct. But it used Space(0020), current rules said to use ZWSP.

Ok.

Post by Maung TunTunLwin
1021 1013 1031 101B 102D 1000 1014 1039 200C 1012 1031 102C 1039 200C 101C
102C 0020 1042 1048 0020 1042 002C 1040 1040 1040 0020 1000 1030 100A 102E=>
"US$28 2,00 ...?" I think help? 1000 1030 100A 102E 1015 102C
Just one character wrong 1031on third place should be 1012.

my original: 1021 1013 1031...
your correction: 1021 1013 1012 ...

I am a bit confused, and looking more carefully, my new guess is: 1021
1019 1031... Apparently, that makes the first word sound like "american".

Post by Maung TunTunLwin
And there should
be no space between 18 2,00.

Post by Maung TunTunLwin
1010 102D 101B 1005 1039 1006 102C 1014 1039 200C 1025 101A 1039 101A 102C
1025 1039 200C 1018 102F 102D 1037 0020 2018 1015 1004 1039 200C 1012 102C
1014 102E 2019 0020 101B 1031 102C 1000 1039 200C => " 'PandaNi' for zoo..."
I don't know what red Panda mean but flow is correct just one big mistake
there
is no 1025 1039 200C LetterU Killer. The characters after 1021 to 102A can
not use as character of Killer.
It is 1009 1039 200C. Character 1009 NYA have two glyph. What you see on
Unicode is normal form glyph. Another form glyph is similar, you can say
same, with 1025 U
used only to pressed killer and Character, that have subscript, also another
kind of killer.

I think I understand. Also, I corrected 1018, which should be 101E.

Post by Maung TunTunLwin
Nice to see you and I'm also wish to change some encoding rules but every
body said TOO LATE.

Just to be clear, I am not proposing any modification to the encoding
model. At best, I can think of clarifications that could help people
like me, who have limited knowledge of the script.

In another place in your message, you mention that the current model is
not optimal for sorting. I am not a specialist of sorting, but this is
not an entirely unusual situation. It is in general not possible to make
the encoding model such that it is optimal for all processings
(rendering, sorting, etc.) You may want to check carefully the UCA, to
see if and how it can handle proper sorting.

Eric.

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/

Maung TunTunLwin

2003-09-25 17:14:39 UTC

Permalink

Hello Mr. Eric Muller,

Post by Eric Muller
<http://www.unicode.org/versions/Unicode4.0.0/ch10.pdf#G24999>

Thank.

Post by Eric Muller

my original: 1021 1013 1031...
your correction: 1021 1013 1012 ...
I am a bit confused, and looking more carefully, my new guess is: 1021
1019 1031... Apparently, that makes the first word sound like "american".

Sorry, my misstake. It should be second place 1013 -> 1012. You may be right
with your sample but currently $ use with 1012.

Post by Eric Muller

I think I understand. Also, I corrected 1018, which should be 101E.

1018 102F 102D 1037 (for), 101E 102F 102D 1037 (to) Both is useable.

Post by Eric Muller
Just to be clear, I am not proposing any modification to the encoding
model. At best, I can think of clarifications that could help people
like me, who have limited knowledge of the script.

I am also not to try to change the standard. I am currently trying to figure
out currenting encoding limitations and looking for ways to extend it.

Post by Eric Muller
In another place in your message, you mention that the current model is
not optimal for sorting. I am not a specialist of sorting, but this is
not an entirely unusual situation. It is in general not possible to make
the encoding model such that it is optimal for all processings
(rendering, sorting, etc.) You may want to check carefully the UCA, to
see if and how it can handle proper sorting.

Yes I know and thank for your advice.
I'm finally accupting the encoding model is not optimal for rendering and
sorting. But there is still two thing I am still afraid,...
One: encoding model must have abality to quick word cutting for sort, wrap,
search.
-Currently I see posibility with wraping at graphite.
Two: encoding model must useable with current rendering systems or it will
be in paper tiger, (three years!).
-I see it can work with Graphite with intelligent input method. But what
about other system? OpentypeFont doesn't handle line wraping Uniscribe did.
But what about Vowel Sign E (1031) handeling? to move front and back?

Sorry I put up too much feeling.

Maung TunTunLwin
***@myanmar.com.mm

------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/