Character Encoding (Charset)
This is the term for code systems used to store and display text of various languages in electronic form. Common encoding systems that support Japanese are Extended Unix Code(EUC-JP), Japanese(JIS), Japanese(Shift-JIS), Unicode(UTF-8). Of these Japanese(JIS) is probably the most common for use on web-pages.
Web servers should specify the correct charset when serving web-pages, or charset information can be given in the <meta> tag of pages. If your browser has the wrong encoding selected when viewing a page it may appear as mojibake.
Sadly, using a <meta> tag rarely works. If the web server sends a Content-Type header like it is supposed to, the browser will use that information instead. The easiest way to override the character encoding of a web page is to use the .htaccess file.
# .htaccess # myfile.html is Shif-JIS Encoded <Files myfile.html> AddCharset SJIS .html </Files>
More than you want to know about Encoding
- The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets
- A complete introduction to Japanese character encodings
- http://www.rikai.com/library/kanjitables/kanji_codes.sjis.shtml -- also Unicode and EUC-JP
I'll come back and clean these up after I finish digging through my bookmarks for more Japanese Information Processing type links.
--Zengargoyle 14:18, 12 July 2006 (EDT)