Japanese text and Notepad/HTML

Japanese, general discussion on the language
Post Reply
User avatar
tanuki
Posts: 2302
Joined: Sun 09.25.2005 9:00 pm
Location: South America

Japanese text and Notepad/HTML

Post by tanuki » Sun 06.24.2007 9:18 pm

Hi there!

I'm learning very very very basic HTML, using the Notepad. It's going great with the Latin alphabet, but when I tried using Japanese text, all I got in my test webpage was:
?????????
Bummer. As always, encoding issues (I guess). I tried using Word, but I couldn't get any HTML going in there for some reason. I even tried JWPCe and I got another kind of gibberish (random symbols).

Anyone could help me? Keep in mind that I'll probably only be able to understand simple answers ^^;. Thanks.
僕の下手な日本語を直してください。

User avatar
Ezrach
Posts: 270
Joined: Tue 07.18.2006 12:05 am
Contact:

RE: Japanese text and Notepad/HTML

Post by Ezrach » Sun 06.24.2007 9:46 pm

You've got to specify a character set in the header.

Code: Select all

<html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; [b]charset=utf-8[/b]">
    </head>
"utf-8" is the unicode character set. Try using that first. If it doesn't work, you'll have to explicitly declare a Japanese set.

User avatar
zengargoyle
Posts: 1200
Joined: Sun 05.29.2005 10:16 pm

RE: Japanese text and Notepad/HTML

Post by zengargoyle » Sun 06.24.2007 9:59 pm

do a 'view source' of a TJP page, you'll find:
<meta http-equiv='Content-Type' content='text/html; charset=x-sjis'>
in the <head> section. this should tell the browser that the text in the page is in SJIS encoding.

you probably don't have this utility, but maybe IE gives you a way to see the HTTP Response Headers for a page request:

$ HEAD 'http://www.thejapanesepage.com/forum/vi ... f=6&t=8615'
200 OK
Connection: close
Date: Mon, 25 Jun 2007 01:31:04 GMT
Server: Apache/1.3.33 (Unix)
Content-Type: text/html
Client-Date: Mon, 25 Jun 2007 01:33:04 GMT=
Client-Peer: 82.165.133.249:80
Client-Response-Num: 1
Set-Cookie: fusion_visited=yes; expires=Tue, 24 Jun 2008 01:31:05 GMT; path=/
X-Powered-By: PHP/4.4.7

see that 'Content-Type:' header, it just says 'text/html' and doesn't have any character set information.... therefore the browser will look inside the HTML for the <meta http-equiv="Content-Type' .../> to decide what encoding to use when rendering the page. and if there isn't a 'meta' tag to specify the encoding then the browser will try and guess the encoding of the page... sometimes this works, sometimes it doesn't.

some HTTP servers can be configured to look at the <head> information and if they find a 'http-equiv="Content-Type" ...' meta tag then they will send back to the browser something like:

200 OK
Connection: close
Date: Mon, 25 Jun 2007 01:31:04 GMT
Server: Apache/1.3.33 (Unix)
Content-Type: text/html; charset=x-sjis
Client-Date: Mon, 25 Jun 2007 01:33:04 GMT=
Client-Peer: 82.165.133.249:80
Client-Response-Num: 1
Set-Cookie: fusion_visited=yes; expires=Tue, 24 Jun 2008 01:31:05 GMT; path=/
X-Powered-By: PHP/4.4.7

(note that the Content-Type has changed to include the encoding information...)

the problem arises when your HTTP server has a 'default' encoding configured. if for instance your server is configured with a 'default' encoding of "ISO-8895-1" then your server will send:

Content-Type: text/html; charset=ISO-8859-1

in the header information, and the encoding setting in the HTTP header will stop the browser from looking inside the actual HTML for the 'meta' tag encoding information. in theory the HTTP server should look at the file and try to find the 'meta' information and override the Content-Type sent in the header to match, but sometimes it doesn't... especially with PHP or other CGI type files.

with Apache, you can override the default encoding with the .htaccess file:

# override all files in this dirctory...
AddDefaultCharset utf-8

# override default charset for a particular file...
<File myfile.html>
AddDefaultCharset utf-8
</File>

so, you have a few options...

if your HTTP server does not have a default encoding that it sends in the headers... just add a 'meta' tag in the <head> to specify the characterset.

otherwise, change your default encoding (for the entire HTTP server, or just for the single file you're working on.) to be the correct encoding for the file.

User avatar
tanuki
Posts: 2302
Joined: Sun 09.25.2005 9:00 pm
Location: South America

RE: Japanese text and Notepad/HTML

Post by tanuki » Sun 06.24.2007 10:04 pm

Thank you for your answers, guys. I could solve the problem. :)
僕の下手な日本語を直してください。

Post Reply