UTF-8, Greek and a Facebook sharer bug?

06 Jul 2010
comments bellow header

Fmag.gr is a photography-news oriented website. Every post has links at the bottom, "print this", "email this" and share on facebook, digg etc.

About a month ago, I saw a strange behavior. If someone was clicking on the "Share on Facebook" link, Facebook was parsing the article with wrong encoding. I couldn’t explain that because it suddenly happened without any change in the drupal code by my side. I was convinced that Facebook is the problem and it will be fixed soon. That did not happen.

Fmag.gr is UTF-8, as every generic drupal website. Fmag.gr is also in Greek, but that does not count, all Greek characters are included in the UTF-8 encoding/charset. So, where the problem was? The W3C validation service is ranking fmag.gr as 100% valid and recognizes the UTF-8 encoding. I checked the mysql collation, UTF-8 generic, fine. I configured the httpd.conf to default UTF-8, php.ini to default UTF-8. Nothing, the problem remained.

Sometimes, for a second or two, the Facebook sharer (sharer.php) was showing valid Greek characters and then it changed back all to glyphs (like projecting UTF-8 as ISO-xxx). I was mad and confused. Google had nothing of a similar situation. I checked PSPad editor, but nothing was wrong, it was saving perfect UTF-8 files. I made all that changes to my server configuration (even if I haven't ever encoding problems before) and I was 200% sure it was serving UTF-8 to the browser, but the problem remained. Above all, other Greek websites, using UTF-8 or ISO (or bad-formed UTF according to W3C validation), were being parsed by Facebook sharer correctly as Greek!


Seconds before the Solution:

The comment tag is used to insert a comment in the source code. A comment will be ignored by the browser. You can use comments to explain your code, which can help you when you edit the source code at a later date.

You can also store program-specific information inside comments. In this case they will not be visible for the user, but they are still available to the program. A good practice is to comment the text inside scripts and style elements to prevent older browsers, that do not support scripting or styles, from showing it as plain text.

A comment can appear anywhere in a document, even before the doctype, but not in other tags. (However, placing comments – or indeed any characters except for whitespace – before the doctype will cause Internet Explorer 6 to use quirks mode for the document).

I use comments a lot. Only one multi-line comment is oddly placed after the doctype but before header. I use it on every website and it contains version and contact information.



Don’t ask me how, but I finally found a bug at the “Facebook sharer” parser. In a spurt of the moment I tried to place the multi-line comment, described above, bellow the header… and voila! Greek at Facebook sharer. Therefore, the Facebook bug can be simply described as that: If a comment is placed before the header tag, Facebook sharer cannot locate the correct encoding. If your site is in English there is no problem as ISO-xxx-1 (and many alternative encodings/charsets) includes all the mainstream English characters.

I placed the multi-line comment right after the head tag and now everything works ok. I am expecting though, for the Facebook to correct this behavior. I don’t like optically and ergonomically this kind of comment to be placed there.