reading assistant

Japanese, general discussion on the language
Post Reply
Posts: 1
Joined: Mon 10.11.2010 10:37 am
Native language: English

reading assistant

Post by gh7 » Mon 10.11.2010 10:48 am

I've been developing a new Japanese reading assistant that is similar to programs like Rikaichan and Reading Tutor but also shows a lot of information about Japanese grammar. Although it is still pretty early in development, I'm trying to collect a lot of feedback to see what other people think about it. For anyone interested, the program can be used at:

Just click on one of the sample texts to begin using it. Since this is still an early version, some grammar do not have any information displayed, but I will be adding more later. If anyone has any trouble please look at the guide or just reply here. There is also a short survey on the site that I'm currently using to collect feedback. Any other comments or questions are always welcome!


User avatar
Posts: 493
Joined: Tue 11.20.2007 2:26 pm
Native language: English
Gender: Male

Re: reading assistant

Post by Hyperworm » Mon 10.11.2010 1:57 pm

I hope you can forgive my criticisms here - my first instinct at seeing something like this is to point out the nasty problems. Natural language parsing is messy...

So you can't just provide verb conjugations, as you know - those are a small subset of Japanese grammar. The problem is that a number of the grammar items you are yet to add are ambiguous. How will you cope with those? Simply not providing explanations for a lot of grammar is no good. You can't assume that the reader knows certain grammar - that defeats the purpose of using or building this tool, since the ambiguous grammar isn't always the simple grammar, and it's hard to say what "simple" grammar is in the first place. And you may mislead by providing those explanations.

For example (these particles all occur in your current texts):
と: You probably need to explain "Quoting と", but it is ambiguous with the 'and' と.
でも: "but", or "even", or something else?
には: "In order to" is not obvious and deserves explaining, but is ambiguous with "in" (with contrast)
である: needs explaining as equal to だ, but will be confused occasionally with <place>である(a certain)<object>を・・・
し: is it the masu stem of する, or the particle?
が: "but", or the subject particle?
や: "inexhaustive and", "Osakan だ"?
という: "says", or its other function?
んです: confusable in all-hiragana sentences...
として: te-form of とする, or the として that means "as"?
What about an example text with lots of contractions and slang in it?
Will you be able to cope with missing い in ている, ていく?

The point is: It's very hard to evaluate how usable and useful this tool will be without seeing how it presents that kind of grammar to the user - that's where it will rise or fall in my opinion, and not on the more easily predictable verb forms.

It might not be possible to automatically disambiguate all - or even any - of those, but hopefully you can find an intelligent presentation so the user can quickly get information on the interpretation that seems most sensible to them. (Also, different interpretations may involve different word segmentation.)

Further thoughts on the current presentation:

When looking at verb info:
a) do not add space between verb parts at the top (just put the boxes around without adding space). I've never seen していて represented as し てい て in any form of Japanese text, even for children. It's misleading to say it can be broken up that way. If anything it's して いて.
b) The words "verb" and "form" in the "modifications" area are mostly redundant. Personally, I'd change "Verb past polite form (Vました)" to "Polite past (Vました)" to emphasize the important information, but it's up to you.
c) It should be technically possible to include the precise verb in conjugation info? (When looking at させられた: First box, "する→させられる (Exception)"; Second box, "させられる→させられた (ru verb)")
d) I don't think 出さねばならない and うかがえる should be recognized under なければならない and られる, since they have different conjugation. I'd create separate items to show conjugation for both.
e) Masu-stem should count as a conjugation on its own. It links sentences like the って form (especially before a comma).


By the way, not to doom this before it's really got going or anything, but if trying to apply this to text in general doesn't work out--for parsing reasons or speed reasons or whatever--having a system that can take Japanese text that has the senses/roles of its grammar parts/words pre-tagged and display that information in an intuitive fashion to the user, is still extremely useful on its own, for any site that presents stories or other text to the user for learning purposes (the stories on this site and the video game scripts on LLTVG are just two examples).

Good luck :D
fun translation snippets | need something translated?

Post Reply