sergebroom: (Gloater)
[personal profile] sergebroom

I just finished updating Sue's official web site ( http://www.susankrinard.com/ ). Every time, I keep expecting I'll break the code. It doesn't matter that I've done this quite a few times since I took over the site's maintenance, and that nothing ever wrong.

Yes, I did notice that, on some of the site's pages, apostrophes are displayed as question marks, or little squares. I finally figured out that this is because the original text that I cut and pasted in employed actual apostrophes. Why this gives our browser a fit, I know not. I'll simply go thru each page and replace the apostrophes with single quotes. Sorry about the ghastly results.

Date: Feb. 11th, 2008 02:33 pm (UTC)
pedanther: Picture of the Pink Panther wearing brainy specs and an academic's mortar board, looking thoughtful. (pedantry)
From: [personal profile] pedanther
I'm pretty sure I know what the problem is, but it's kind of technical and the solution I would have recommended is the one you've already decided on anyway, so I won't try to explain it to you if you don't want me to.

Date: Feb. 11th, 2008 02:46 pm (UTC)
From: [identity profile] serge-lj.livejournal.com
Please do tell me. It might come in handy.

Date: Feb. 12th, 2008 12:58 am (UTC)
pedanther: Picture of the Pink Panther wearing brainy specs and an academic's mortar board, looking thoughtful. (pedantry)
From: [personal profile] pedanther
Okay, here goes. (Apologies if it's confusing, or conversely if I end up telling you stuff you know already.)

The key concept is character encoding, which is a way of converting the letters and symbols humans recognise as writing into numbers the computer can work with, and vice versa. This is done using a character set, which specifies the numbers that are equivalent to each character - for instance, that "A" is represented by the binary number 1000001, "B" by 1000010, and so on.

The catch is that there is not just one character set, for various historical reasons, and if the web page is stored using one character set and the browser is reading it using a different character set, not all the character mappings will match. The alphabet and basic punctuation - essentially, anything that the US standard keyboard has a key for, including non-curly apostrophes - will be all right, because they were standardized in the 1960s and now all Latin-alphabet character sets use the same mappings for those characters; but anything more esoteric - including curly apostrophes - may have a different mapping in different character sets.

The result is that the browser reads the code that's supposed to mean "curly apostrophe", looks it up in the wrong character set, and displays either the wrong character or a symbol meaning "I don't know what this is" (which is what the question mark and the little square are).

With me so far?

In most cases, the easiest thing to do about this problem is to avoid it, by sticking with the standard basic alphabet and punctuation, so that it doesn't matter which character sets are being used. (This is far easier than, for instance, trying to exert an iron grip on your visitors' web browsers' display settings.)

This doesn't mean you have to give up on other punctuation, by the way. HTML helpfully includes a system that allows you to represent esoteric punctuation using only standard characters. For instance, if you want to keep using curly apostrophes, you can write ’ and the browser will display it as a right-handed single quote, like so: ’

For another example, © is displayed as the copyright sign: ©
(I mention this because in my browser the little squares are also appearing in the copyright notice on your site where the copyright sign should be.)

Date: Feb. 12th, 2008 05:24 pm (UTC)
From: [identity profile] serge-lj.livejournal.com
Thanks for the explanation. I will keep this in mind, should there be a situation where I want unusual characters. Now, back to the code to replace those curly apostrophes with single quotes. Ah indeed, the dangers of cutting and pasting...