Discussion:
Mojibake on my Web pages
Doug Ewell
2003-09-23 05:40:26 UTC
Permalink
Apologies in advance to anyone who visits my Web site and sees garbage
characters, a.k.a. "mojibake." It isn't my fault.

Adelphia is currently having a character-set problem with their HTTP
servers. Apparently they are serving all pages as ISO 8859-1 even if
they are marked as being encoded in another character set, such as
UTF-8. So, instead of seeing U+2022 BULLET on my page, for example,
you'll see:

•

If you manually change the encoding in your browser to UTF-8, or
download the page and display it as a local file, everything looks fine
because Adelphia's server is no longer calling the shot. Their tech
support people acknowledge that the problem is at their end and said
they would look into it.

I understand that having the "Unicode Encoded" logo on my page next to
these garbage characters may not reflect well on Unicode, especially to
newbies. I'm considering putting a disclaimer at the top of my pages,
but I'm waiting to see how quickly they solve the problem.

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
j***@att.net
2003-09-24 07:18:40 UTC
Permalink
.
Doug Ewell wrote,
... I'm considering putting a disclaimer at the top of my pages,
but I'm waiting to see how quickly they solve the problem.
More than twenty-four hours, apparently.

You might try either changing the pages to NCRs or, uh, changing
servers...

Best regards,

James Kass
.


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Stefan Persson
2003-09-24 12:21:13 UTC
Permalink
Post by j***@att.net
Doug Ewell wrote,
... I'm considering putting a disclaimer at the top of my pages,
but I'm waiting to see how quickly they solve the problem.
More than twenty-four hours, apparently.
You might try either changing the pages to NCRs or, uh, changing
servers...
Is there no way to force the browsers to use the encoding as specified
in the documents instead of that specified by the server? I'm having
this problem myself with a different server, and would like to find a
solution to it. It is very irritating that the HTTP header overrules
the <meta> tag, since it seems that the error is more often in the HTTP
header than in the <meta> tag.

Stefan



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Doug Ewell
2003-09-24 15:32:42 UTC
Permalink
Post by Stefan Persson
Is there no way to force the browsers to use the encoding as specified
in the documents instead of that specified by the server? I'm having
this problem myself with a different server, and would like to find a
solution to it.
I can always visit View | Encoding and change the setting to UTF-8 on a
one-time basis. But as soon as the page is refreshed, it reverts to
whatever the server specifies.

I don't know if there's a way to teach IE that a given URL should
*always* be overridden to UTF-8, but even if there was, that would only
help me and those who know the secret. It should work for everybody.
Post by Stefan Persson
It is very irritating that the HTTP header overrules the <meta> tag,
since it seems that the error is more often in the HTTP header than in
the <meta> tag.
Indeed. You'd think if the author (or software) included a <meta> tag
AND an explicit declaration in the XML header, he (or it) knew what he
(or it) was doing and the tag(s) should be honored.

Apologies to the list if this is getting OT.

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
j***@spin.ie
2003-09-24 17:01:21 UTC
Permalink
Post by Doug Ewell
Post by Stefan Persson
It is very irritating that the HTTP header overrules the <meta>
tag,
Post by Stefan Persson
since it seems that the error is more often in the HTTP header than in
the <meta> tag.
Indeed. You'd think if the author (or software) included a <meta> tag
AND an explicit declaration in the XML header, he (or it) knew what he
(or it) was doing and the tag(s) should be honored.
Experience shows that there is no reason for assuming this degree of competence on the part of authors, certainly not over the degree of competence you assume for server administrators.

However, rather than being a judgement call on whether authors are more likely to include incorrect declarations (which they are) or server administrators to set incorrect headers (which they are also), the policy of having the HTTP header over-ride the contained declaration has a sound technical basis:

The author was not the last entity to "touch" the document, the server was. As such the server could have re-encoded the document (as some servers and other agents may do with text/* documents) without altering any self-description features specific to that particular type of document. As such assuming a reasonable degree of competence on the part of both author and server only the server's description of the encoding can be trusted.

In practice it doesn't work like that and browsers have to add features to enable users to manually change the encoding.

Maybe including a BOM would help the browser realise something was awry, but it's just as likely to think the author just wrote an invalid document that began with 






------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Peter Kirk
2003-09-24 19:27:19 UTC
Permalink
Post by j***@spin.ie
Post by Doug Ewell
Post by Stefan Persson
It is very irritating that the HTTP header overrules the <meta>
tag,
Post by Stefan Persson
since it seems that the error is more often in the HTTP header than in
the <meta> tag.
Indeed. You'd think if the author (or software) included a <meta> tag
AND an explicit declaration in the XML header, he (or it) knew what he
(or it) was doing and the tag(s) should be honored.
Experience shows that there is no reason for assuming this degree of competence on the part of authors, certainly not over the degree of competence you assume for server administrators.
Few authors (and authoring programs) can have a lower degree of
competence than a server administrator who simply assumes that all
documents are CP1252 even when clearly mojibake. And authors should be
responsible for their own garbage without servers making misguided
attempts to clean it up.
--
Peter Kirk
***@qaya.org (personal)
***@qaya.org (work)
http://www.qaya.org/




------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Doug Ewell
2003-09-25 06:31:43 UTC
Permalink
Post by j***@spin.ie
Maybe including a BOM would help the browser realise something was
awry, but it's just as likely to think the author just wrote an
invalid document that began with 
I've been told, hee hee hee, that the one thing I must NEVER NEVER do in
a Web page is to begin it with a BOM. But I admit I haven't tried that
yet. How funny would that be if it solved the problem?

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
j***@att.net
2003-09-24 22:22:15 UTC
Permalink
jon at spin dot ie wrote,
Post by j***@spin.ie
However, rather than being a judgement call on whether authors are more
likely
Post by j***@spin.ie
to include incorrect declarations (which they are) or server administrators
to
Post by j***@spin.ie
set incorrect headers (which they are also), the policy of having the HTTP
The author was not the last entity to "touch" the document, the server was.
As
Post by j***@spin.ie
such the server could have re-encoded the document (as some servers and
other
Post by j***@spin.ie
agents may do with text/* documents) without altering any self-description
features specific to that particular type of document. As such assuming a
reasonable degree of competence on the part of both author and server only
the
Post by j***@spin.ie
server's description of the encoding can be trusted.
Suppose you made a document and sent it to me via conventional post.

The last agent handling the document would be the mail carrier.
Does the mail carrier have the right to open the mailing and
replace your document with garbage?

An analogy:

Author = Host
Document = Wine
Reader = Guest
Server = Cup

If the host pours a cup of wine for the guest, would we allow a
mere cup to adulterate our wine?

Best regards,

James Kass
.


------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Deepayan Sarkar
2003-09-24 23:32:09 UTC
Permalink
Post by j***@att.net
Suppose you made a document and sent it to me via conventional post.
The last agent handling the document would be the mail carrier.
Does the mail carrier have the right to open the mailing and
replace your document with garbage?
Author = Host
Document = Wine
Reader = Guest
Server = Cup
If the host pours a cup of wine for the guest, would we allow a
mere cup to adulterate our wine?
I'm not saying it's necessarily the right thing to do, but there may be an
argument in favour of the server modifying the document. To take up your
examples, one might claim that the mail carrier has a legitimate right to,
say, irradiate the mail to kill off any possible Anthrax contamination. Or
that any sufficiently advanced cup is allowed to take action to remove any
poisonous substance from the wine served in it. Of course, these actions may
have unintended consequeces till (if at all) the technology is perfected, and
one should have the option of 'turning off' these features. I think it's
understandable if the default settings are as defensive as possible.

I don't know what the HTTP server in question is, but this document may be
relevant (mentioned in Apache's configuration file, just above the option
that controls whether a default character set should be added to the
documents it serves):

http://httpd.apache.org/info/css-security/

Deepayan



------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
j***@spin.ie
2003-09-25 09:03:35 UTC
Permalink
Post by Doug Ewell
Post by j***@spin.ie
Maybe including a BOM would help the browser realise something was
awry, but it's just as likely to think the author just wrote an
invalid document that began with 
I really have to stop using this web-2-mail app, it managed to mangle my representation of a mangled BOM!
Post by Doug Ewell
I've been told, hee hee hee, that the one thing I must NEVER NEVER do in
a Web page is to begin it with a BOM. But I admit I haven't tried that
yet. How funny would that be if it solved the problem?
Well with UTF-16 you really should use a BOM but with UTF-8 there was a bit of a debate which finally settled on the opinion that it was OK to do so. However browsers are not necessarily going to agree with that opinion. Anyway, it couldn't hurt to try.






------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Martin Duerst
2003-09-25 20:32:27 UTC
Permalink
Hello Doug, others,

Here is my most probable explanation:
Adelphia recently upgraded to Apache 2.0. The core config file (httpd.conf)
as distributed contains an entry
AddDefaultCharset iso-8859-1
which does what you have described. They probably adopted this
because the comment in the config file suggests that it's important.

I have just filed a bug with bugzilla, asking that this default
setting be removed or commented out, and the comment fixed, at
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23421. You may
want to vote for that bug.

I have also commented on a related bug that I found, at
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14513.

I suggest you tell your Internet provider:
1) that they change to AddDefaultCharset Off
(or simply comment this out)
2) that they make sure you get FileInfo permission in your directories,
so that you can do the settings you know you are correct.

The comment in the config file contains mostly very strange statements:
#
# Specify a default charset for all pages sent out. This is
# always a good idea and opens the door for future internationalisation
# of your web site, should you ever want it. Specifying it as
# a default does little harm; as the standard dictates that a page
# is in iso-8859-1 (latin1) unless specified otherwise i.e. you
# are merely stating the obvious. There are also some security
# reasons in browsers, related to javascript and URL parsing
# which encourage you to always set a default char set.
#
AddDefaultCharset ISO-8859-1
If anybody knows something about these security issues, please
tell me (any mention of security issues usually has webmasters
in control, for good reasons).


Regards, Martin.
Post by Doug Ewell
Apologies in advance to anyone who visits my Web site and sees garbage
characters, a.k.a. "mojibake." It isn't my fault.
Adelphia is currently having a character-set problem with their HTTP
servers. Apparently they are serving all pages as ISO 8859-1 even if
they are marked as being encoded in another character set, such as
UTF-8.
If you manually change the encoding in your browser to UTF-8, or
download the page and display it as a local file, everything looks fine
because Adelphia's server is no longer calling the shot. Their tech
support people acknowledge that the problem is at their end and said
they would look into it.
I understand that having the "Unicode Encoded" logo on my page next to
these garbage characters may not reflect well on Unicode, especially to
newbies. I'm considering putting a disclaimer at the top of my pages,
but I'm waiting to see how quickly they solve the problem.
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
------------------------ Yahoo! Groups Sponsor ---------------------~-->
KnowledgeStorm has over 22,000 B2B technology solutions. The most comprehensive IT buyers' information available. Research, compare, decide. E-Commerce | Application Dev | Accounting-Finance | Healthcare | Project Mgt | Sales-Marketing | More
http://us.click.yahoo.com/IMai8D/UYQGAA/cIoLAA/8FfwlB/TM
---------------------------------------------------------------------~->

To Unsubscribe, send a blank message to: unicode-***@yahooGroups.com

This mailing list is just an archive. The instructions to join the true Unicode List are on http://www.unicode.org/unicode/consortium/distlist.html


Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Loading...