tepples wrote:
I thought Japanese word breaks were easy to approximate: add a break between a hiragana and the kanji or katakana that immediately follows it. How well does that heuristic perform?
Wouldn't work with anything that has an お or ご prefix... though I suppose in general it's still more likely to give reasonable line breaks most of the time. You still have a problem if the break happens in a run of hiragana-only words, though.
rainwarrior wrote:
You can see the problem in automated translation, where the same sequence of characters could be grouped into different words with different meanings (especially where words have various possible suffixes). There's a lot of ambiguity that's hard to resolve without being a human speaker of Japanese.
Automated translation is awful even if you provide spaces though. English makes a lot of context explicit while Japanese focuses on the bare minimum required and more often than not the grammar isn't even 100% correct (skipped particles, etc.).
Even if you could work around the grammar issue, translating to English means having to bring fill in that missing context somehow, which is why you end up with people writing "he is" instead of "it's" (or worse, when it should've been "I'm"), etc. You can cheat somet of that, but eventually you'll have to guess to avoid the translation to be too awkward.
And of course because nobody wants to make translators for every possible combination, Japanese to any other language is nearly always implemented as Japanese to English to that other language, amplifying issues even further, especially if the target language would have been easier to translate into directly (i.e. not rely as badly on the missing context).