[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] OT: detecting proper nouns



G'day,

> Additionally, if no UTF-8 encoding can be output by the tool of your
> choosing, you can always opt to simply pipe the output through a
> character-encoding converter tool such as iconv as well:
>
> some_command_that_outputs_euc_jp | iconv -f EUC-JP -t UTF-8 > output.txt
>
> ..or something similar.

Although this fails spectacularly when you have text that can't be
represented in EUC-JP mixed in, which is not so uncommon these days.

-- 
Francis Bond <http://www2.nict.go.jp/x/x161/en/member/bond/>
NICT Language Infrastructure Group