Character encodings – a short tutorial
May. 15th, 2011 05:36 pmThere is a huge profusion of terms around sets of letters, symbols, and characters and how they are encoded onto computers. I found myself embroiled in this mess working at a speech-technology company back in the day as we struggled to adapt a speech recognizer designed for English to cope with Korean — which turns out to have a surprisingly regular orthographic system. (Never mind that none of us on the team knew Korean.)
But I’m here today to talk about the key distinctions I learned in dealing with those letters, symbols, and characters on a computer. More after the jump, if you don’t mind me overexplaining a little.
Mirrored from Trochaisms.