Encoding

From WagaWiki

(Difference between revisions)
Jump to: navigation, search
m (Character Encoding (Charset): typo)
 
(5 intermediate revisions not shown)
Line 1: Line 1:
==Character Encoding (Charset)==
==Character Encoding (Charset)==
-
This is the term for code systems used to store and display text of various languages in electronic form.  Common encoding systems that support Japanese are Japanese(EUC), Japanese(JIS), Japanese(Shift-JIS), Unicode(UTF-8).  Of these Japanese(JIS) is probably the most common for use on web-pages.   
+
This is the term for code systems used to store and display text of various languages in electronic form.  Common encoding systems that support Japanese are Extended Unix Code(EUC-JP), Japanese(JIS), Japanese(Shift-JIS), Unicode(UTF-8).  Of these Japanese(JIS) is probably the most common for use on web-pages.   
Web servers should specify the correct charset when serving web-pages, or charset information can be given in the <meta> tag of pages.  If your browser has the wrong encoding selected when viewing a page it may appear as [[mojibake]].
Web servers should specify the correct charset when serving web-pages, or charset information can be given in the <meta> tag of pages.  If your browser has the wrong encoding selected when viewing a page it may appear as [[mojibake]].
 +
 +
Sadly, using a <meta> tag rarely works.  If the web server sends a '''Content-Type''' header like it is supposed to, the browser will use that information instead.  The easiest way to override the character encoding of a web page is to use the ''.htaccess'' file.
 +
 +
<pre>
 +
# .htaccess
 +
# myfile.html is Shift-JIS Encoded
 +
<Files myfile.html>
 +
  AddCharset SJIS .html
 +
</Files>
 +
</pre>
 +
 +
==More than you want to know about Encoding==
 +
* [http://www.joelonsoftware.com/printerFriendly/articles/Unicode.html The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets]
 +
*[http://www.cs.mcgill.ca/~aelias4/encodings.html A complete introduction to Japanese character encodings]
 +
*http://www.csse.monash.edu.au/~jwb/coding_inf.html
 +
*http://www.herongyang.com/unicode/
 +
*http://www.rikai.com/library/kanjitables/kanji_codes.sjis.shtml -- also Unicode and EUC-JP
 +
*http://www.jbrowse.com/text/unij.html
 +
*http://www.alanwood.net/unicode/index.html
 +
 +
I'll come back and clean these up after I finish digging through my bookmarks for more Japanese Information Processing type links.
 +
 +
--[[User:Zengargoyle|Zengargoyle]] 14:18, 12 July 2006 (EDT)
 +
[[Category:Computer]]

Current revision as of 08:13, 15 August 2006

Character Encoding (Charset)

This is the term for code systems used to store and display text of various languages in electronic form. Common encoding systems that support Japanese are Extended Unix Code(EUC-JP), Japanese(JIS), Japanese(Shift-JIS), Unicode(UTF-8). Of these Japanese(JIS) is probably the most common for use on web-pages.

Web servers should specify the correct charset when serving web-pages, or charset information can be given in the <meta> tag of pages. If your browser has the wrong encoding selected when viewing a page it may appear as mojibake.

Sadly, using a <meta> tag rarely works. If the web server sends a Content-Type header like it is supposed to, the browser will use that information instead. The easiest way to override the character encoding of a web page is to use the .htaccess file.

 # .htaccess
 # myfile.html is Shift-JIS Encoded
 <Files myfile.html>
  AddCharset SJIS .html
 </Files>

More than you want to know about Encoding

I'll come back and clean these up after I finish digging through my bookmarks for more Japanese Information Processing type links.

--Zengargoyle 14:18, 12 July 2006 (EDT)

Personal tools