Learn Japanese with JapanesePod101.com

View topic - Some characters don't display properly in forum

Some characters don't display properly in forum

Spotted any errors or found anything that doesn't work?

Some characters don't display properly in forum

Postby hihlordjp » Sun 05.29.2005 7:01 am

Hi, Clay! I don't know if this has been brought up before, but some characters won't display properly on the forum. Some kanji, some katakana. The katakana "so" won't appear properly...

例:パ?\コン (PASOKON)

I wonder what causes this...
俺様は何時か此の地球の帝王に成るぞ!
...ジョウダンだよ。ヘヘ ^^;;

「君という光が私を見つける // 真夜中に」-- 「光」という歌より(歌手:宇多田ヒカル)
hihlordjp
 
Posts: 144
Joined: Fri 02.11.2005 11:37 am

RE: Some characters don't display properly in forum

Postby clay » Sun 05.29.2005 2:33 pm

I have noticed this. I am not an expert on how the Japanese encoding works, but I think it isn't totally compatable with my database (MySQL). If anyone knows why this is so, please let me know.

For the most part it is just a minor trouble, but I would sure like to figure out how to stop it from happening.

Clay
User avatar
clay
Site Admin
 
Posts: 2809
Joined: Fri 01.21.2005 9:39 am
Location: Florida

RE: just checking...

Postby zengargoyle » Mon 05.30.2005 1:51 am

so so - ・#92; ・#92;
Code: Select all
code so - ・#92; ・#92;

so so - そ そ

this is not a mysql issue, but a php issue. the katakana 'so' in SHIFT_JIS encodes as '0203 0134' (octal). 0134 == '\' == 92 (decimal). somewhere the php code is trying to do some magic to escape the backslash, probably to try to prevent SQL injection attacks. oh, and the evil breakage happens on preview which shouldn't involve the database so i'm pretty sure it's just a php problem, if it was a mysql problem alot more things would likely be broken as well.

here are a couple links about the problem, and maybe a fix if you can find where in
the php code it's happening.

http://www.phpbuilder.com/lists/php-i18n/2003031/0010.php
http://php.oss.eznetsols.org/manual/it/function.stripslashes.php (post in italian
i think... but comments in english)

the fix is to have the stripslash function only handle things like backslash-quote and
backslash-doublequote and backslash-backslash and to leave everything else alone.
User avatar
zengargoyle
 
Posts: 1200
Joined: Sun 05.29.2005 10:16 pm

RE: Some characters don't display properly in forum

Postby clay » Tue 05.31.2005 10:08 pm

Thanks!

I will read up on that. I deeply appreciate your help. So it may be a matter of taking out the stripslashes command? I imagine that could cause problems if people use quotes or other system characters?

Sorry, I should read up on this before making a comment. :)

Clay
User avatar
clay
Site Admin
 
Posts: 2809
Joined: Fri 01.21.2005 9:39 am
Location: Florida

RE: Some characters don't display properly in forum

Postby zengargoyle » Wed 06.01.2005 2:47 pm

found the english docs...
http://www.php.net/manual/en/function.stripslashes.php

you can try this:

check out http://www.php.net/manual/en/ref.info.php#ini.magic-quotes-gpc
and then make a quick check by turning off magic-quotes-gpc in your php.ini file
(it's on by default). make a test post and see if it works.

since you said something about using MySQL, take a look at http://www.php.net/manual/en/function.mysql-real-escape-string.php
there's some code there that shows how to handle safe updates to the database,
and some examples of why at least some escaping in needed to prevent SQL-Injection
attacks.

theoretically, if fusion_forum is written well (uses the mysql-real-escape-string function
consistantly...) then you could probably do without magic-quotes-gpc.

hopefully the magic-quotes-gpc test will at least confirm where the problem is. if needed
i can take a look at the post.php script, i'm not a PHP person (Perl is sooo much better :P) but i think i could hack something workable from the info found so far.
User avatar
zengargoyle
 
Posts: 1200
Joined: Sun 05.29.2005 10:16 pm

RE: Some characters don't display properly in forum

Postby clay » Thu 06.02.2005 10:53 am

ok I just turned it off:

パ・#92;コン

・#92; test

hmmm... so is still giving trouble "' - but quotes are ok...
Last edited by clay on Thu 06.02.2005 10:54 am, edited 1 time in total.
User avatar
clay
Site Admin
 
Posts: 2809
Joined: Fri 01.21.2005 9:39 am
Location: Florida

RE: Some characters don't display properly in forum

Postby zengargoyle » Thu 06.02.2005 8:05 pm

ok, new theory... this time i'm pretty
sure i've got it figured out. i've done
quite a bit of testing.

this site uses SHIFT_JIS encoding, so
normal ASCII stuff is one byte in the
7-bit range (less than 0x7F -ish)

the kana/kanji are encoded in multiple
bytes, with the first byte in the 8-bit
range (0x80 and up).

katakana 'so' is encoded with the
2 byte sequence (0x81 0x5c). now
0x5c in plain ASCII is a backslash '\'
what's happening is this:

the 0x5c is being escaped using
HTML escapes. 0x5c is 92 decimal.
so the escape for a backslash is '\'
so now our 'so' has been mutated from
a 2 byte sequence 0x81 0x5c
into a six byte sequence:
0x81 0x26(&) 0x23(#) 0x39(9) 0x32(2) 0x3b(;)

now in SHIFT_JIS, the 2 byte sequence
0x81 0x26 is '・' and the other 4 characters
being plain ASCII (less than 0x7F) display
normally as '#92;'

so our 'so' is turned into 「・#92;」 '・#92;'

so, create a post with a single katakana 'so',
then look at the MySQL database record where
the post has been stored. if there are six or so
bytes then the damage is done before it's put
into the database. if the database has only a
couple of characters for the post then the damage
is done after fetching from the database and before
displaying. i'm betting on the former.

so... look into post.php and try to find where
it retrieves the text from this post textarea into
a variable. then try to find a place where it does
some sort of html escaping. probably to keep
people from using HTML to imbed images, links
and such. see if you can disable it for another
test.

i would take a closer look, but i can't find the
php_fusion v5 source, it's no longer available
on the php_fusion site.
User avatar
zengargoyle
 
Posts: 1200
Joined: Sun 05.29.2005 10:16 pm

RE: Some characters don't display properly in forum

Postby clay » Thu 06.02.2005 9:30 pm

?#92;
User avatar
clay
Site Admin
 
Posts: 2809
Joined: Fri 01.21.2005 9:39 am
Location: Florida

RE: Some characters don't display properly in forum

Postby zengargoyle » Fri 06.03.2005 2:15 pm

heh, sorry if i'm keeping you busy
or leading on wild goose chases.


the code for php_fusion 6.0 is finally
available and i've taken a quick look
this morning.

Code: Select all
/* from postedit.php */
if (isset($_POST['previewchanges'])) {
        $disable_smileys_check = isset($_POST['disable_smileys']) ? " checked" :
 "";
        $del_check = isset($_POST['delete']) ? " checked" : "";
        opentable($locale['405']);
        $subject = trim(stripinput(censorwords($_POST['subject'])));
        $message = trim(stripinput(censorwords($_POST['message'])));
        if ($subject == "") $subject = $pdata['post_subject'];
        if ($message == "") {
                $previewmessage = $locale['421'];
        } else {
                $previewmessage = $message;
                if (!$disable_smileys_check) { $previewmessage = parsesmileys($p
reviewmessage); }
                $previewmessage = parseubb($previewmessage);
                $previewmessage = nl2br($previewmessage);
        }
/* ...snip...*/


you can see in there that it first censors some words
from the posts, then it does a 'stripinput', then it trims
whitespace from the front/back. then some smilely stuff,
some ubb stuff, and newline to html br stuff.

NOTE: it seems that the code tag doesn't do such a
good job of preserving it's contents (suprise suprise)
all of the '\' in the code got turned into Yen signs (ha)
and all of the & # 9 2 ; like html entities got turned back
into slashes which then get turned into Yen signs (double ha).

Code: Select all
/* from maincore.php */
// Strip Input Function, prevents HTML in unwanted places
function stripinput($text) {
        if (QUOTES_GPC) $text = stripslashes($text);
        $search = array("\"", "'", "\\", '\"', "\'", "<", ">", " ");
        $replace = array(""", "'", "\", """, "'", "<",
">", " ");
        $text = str_replace($search, $replace, $text);
        return $text;
}


and *bam* stripinput is replacing '\' with '\'.

there are a few other functions in maincode.php that do
the same sort of thing.

Code: Select all
// htmlentities is too agressive so we use this function
function phpentities($text) {
        $search = array("&", "\"", "'", "\\", "<", ">");
        $replace = array("&", """, "'", "\", "<", ">");
        $text = str_replace($search, $replace, $text);
        return $text;
}


// Trim a line of text to a preferred length
function trimlink($text, $length) {
        $dec = array("\"", "'", "\\", '\"', "\'", "<", ">");
        $enc = array(""", "'", "\", """, "'", "<", ">
;");
        $text = str_replace($enc, $dec, $text);
        if (strlen($text) > $length) $text = substr($text, 0, ($length-3))."..."
;
        $text = str_replace($dec, $enc, $text);
        return $text;
}



of course this is looking at v6.0 code instead of
v5.0 but it's likely about the same, or will maybe
give you a better idea of what to look for. check
the post*.php files for things that get done to
the message and then check those functions for
things involving backslashes and the evil \

i'm pretty certain that it's the stripinput function
that is doing the evil assuming v6 and v5 aren't
that different. so if v5 has the same function,
try removing the "\\", from search and the "& # 92 ;",
from the replace and see what happens.
User avatar
zengargoyle
 
Posts: 1200
Joined: Sun 05.29.2005 10:16 pm

RE: Some characters don't display properly in forum

Postby clay » Sun 06.05.2005 12:40 pm

Test ・#92; ・#92; 冬の・#92;ナタ ・#92; ?#92;ナタ
Last edited by clay on Mon 06.06.2005 1:08 pm, edited 1 time in total.
User avatar
clay
Site Admin
 
Posts: 2809
Joined: Fri 01.21.2005 9:39 am
Location: Florida

RE: Some characters don't display properly in forum

Postby clay » Mon 06.06.2005 4:18 pm

Testing
冬のャiタ
User avatar
clay
Site Admin
 
Posts: 2809
Joined: Fri 01.21.2005 9:39 am
Location: Florida

RE: Some characters don't display properly in forum

Postby hihlordjp » Tue 06.07.2005 10:58 am

Great work, guys! isn't elusive anymore! hooray! I hope the kanji don't give us trouble...
俺様は何時か此の地球の帝王に成るぞ!
...ジョウダンだよ。ヘヘ ^^;;

「君という光が私を見つける // 真夜中に」-- 「光」という歌より(歌手:宇多田ヒカル)
hihlordjp
 
Posts: 144
Joined: Fri 02.11.2005 11:37 am

RE: Some characters don't display properly in forum

Postby zengargoyle » Tue 06.07.2005 11:26 am

these *should* be all of the things that were broken.
according to the character maps from my system there
aren't any other SHIFT_JIS characters that would get
munged by the forum software. (knock on wood).

81 5c HORIZONTAL BAR
83 5c KATAKANA LETTER SO
84 5c CYRILLIC CAPITAL LETTER YERU
89 5c <CJK>
8a 5c <CJK>
8b 5c <CJK>
8c 5c <CJK>
8d 5c <CJK>
8e 5c <CJK>
8f 5c <CJK>
90 5c <CJK>
91 5c <CJK>
92 5c <CJK>
93 5c <CJK>
94 5c <CJK>
95 5c <CJK>
96 5c <CJK>
97 5c <CJK>
98 5c <CJK>
99 5c <CJK>
9a 5c <CJK>
9b 5c <CJK>
9c 5c <CJK>
9d 5c <CJK>
9e 5c <CJK>
9f 5c <CJK>
e0 5c <CJK>
e1 5c <CJK>
e2 5c <CJK>
e3 5c <CJK>
e4 5c <CJK>
e5 5c <CJK>
e6 5c <CJK>
e7 5c <CJK>
e8 5c <CJK>
e9 5c <CJK>
ea 5c <CJK>
Last edited by zengargoyle on Tue 06.07.2005 11:27 am, edited 1 time in total.
User avatar
zengargoyle
 
Posts: 1200
Joined: Sun 05.29.2005 10:16 pm

RE: Some characters don't display properly in forum

Postby hihlordjp » Wed 06.08.2005 1:50 am

I remember that the kanji wouldn't display properly. Now it does! Yipee!
俺様は何時か此の地球の帝王に成るぞ!
...ジョウダンだよ。ヘヘ ^^;;

「君という光が私を見つける // 真夜中に」-- 「光」という歌より(歌手:宇多田ヒカル)
hihlordjp
 
Posts: 144
Joined: Fri 02.11.2005 11:37 am

RE: Some characters don't display properly in forum

Postby clay » Wed 06.08.2005 9:21 am

:) A BIG THANKS TO zengargoyle for all his hard work. There is no way I could have figured it out. :o
User avatar
clay
Site Admin
 
Posts: 2809
Joined: Fri 01.21.2005 9:39 am
Location: Florida


Return to Problems

Who is online

Users browsing this forum: No registered users and 0 guests