[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] More about parsing sentences
[Answering this out-of-order ...]
On 21/08/07, Jim Rose <jim@kanjicafe.com> wrote:
> I'm seeing 役割 doesn't get parsed right:
>
> Should be:
>
> 役割(P); 役割り 【やくわり】 (n) part; assigning (allotment
> of) parts; role; duties; (P)
>
> But instead its cut into two entries:
>
> BCはさらに、水面で休息する際や非常時のための救
> 命胴衣的な役割も
>
> さらに、 (adv,conj) furthermore; again; after all; more and more;
> moreover; KD
> 水面 【すいめん(P); みなも; みのも】 (n) water's
> surface; (P)
> 休息 【きゅうそく】 (n,vs) rest; relief; relaxation; (P)
> 際 【さい】 (n-adv,n) on the occasion of; circumstances; (P)
> 非常時 【ひじょうじ】 (n) (time of) emergency; crisis
> のための (thing which) is on account of; KD
> 救命胴衣 【きゅうめいどうい】 (n) life-jacket
> 的 【てき】 (adj-na,suf) -like; typical; (P)
> 役 【やく】 (n,n-suf) use; service; role; position; (P)
> 割 【かつ】 (n) divide; cut; halve; separate; split; rip; break;
> crack; smash; dilute; (P)
I can't reproduce this one. When I put the above text into WWWJDIC, I get
(for the latter part):
# のための (thing which) is on account of; KD
# 救命胴衣 【きゅうめいどうい】 (n) life-jacket; ED
# 的 【てき; まと】 (てき) (adj-na,suf) -like; typical; (まと) (n) mark; target; SP
# 役割 【やくわり】 (n) part; assigning (allotment of) parts; role; duties; (P); EP
which is what you expect.
One thing that can cause 役割 to be broken up is the presence of some
whitespace characters between them in the original. I try and ignore
them, but some thing like a "<br>" in the original might throw it.
> Any chance you could make a word file ordered by size (largest first)
> and then scan sentences for matches? Might make this particular bug
> go away by matching largest first.
WWWJDIC does use a form of longest-first already.
Jim
--
Jim Breen
Honorary Senior Research Fellow
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
http://www.csse.monash.edu.au/~jwb/