[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] OT: detecting proper nouns
G'day,
> Additionally, if no UTF-8 encoding can be output by the tool of your
> choosing, you can always opt to simply pipe the output through a
> character-encoding converter tool such as iconv as well:
>
> some_command_that_outputs_euc_jp | iconv -f EUC-JP -t UTF-8 > output.txt
>
> ..or something similar.
Although this fails spectacularly when you have text that can't be
represented in EUC-JP mixed in, which is not so uncommon these days.
--
Francis Bond <http://www2.nict.go.jp/x/x161/en/member/bond/>
NICT Language Infrastructure Group