[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] More about parsing sentences



[Answering this out-of-order ...]

On 21/08/07, Jim Rose <jim@kanjicafe.com> wrote:
> I'm seeing 役割 doesn't get parsed right:
>
>  Should be:
>
>  役割(P); 役割り 【やくわり】 (n) part; assigning (allotment
>  of) parts; role; duties; (P)
>
>  But instead its cut into two entries:
>
>  BCはさらに、水面で休息する際や非常時のための救
>  命胴衣的な役割も
>
>  さらに、 (adv,conj) furthermore; again; after all; more and more;
>  moreover; KD
>  水面 【すいめん(P); みなも; みのも】 (n) water's
>  surface; (P)
>  休息 【きゅうそく】 (n,vs) rest; relief; relaxation; (P)
>  際 【さい】 (n-adv,n) on the occasion of; circumstances; (P)
>  非常時 【ひじょうじ】 (n) (time of) emergency; crisis
>  のための (thing which) is on account of; KD
>  救命胴衣 【きゅうめいどうい】 (n) life-jacket
>  的 【てき】 (adj-na,suf) -like; typical; (P)
>  役 【やく】 (n,n-suf) use; service; role; position; (P)
>  割 【かつ】 (n) divide; cut; halve; separate; split; rip; break;
>  crack; smash; dilute; (P)

I can't reproduce this one. When I put the above text into WWWJDIC, I get
(for the latter part):

# のための (thing which) is on account of; KD
# 救命胴衣 【きゅうめいどうい】 (n) life-jacket; ED
# 的 【てき; まと】 (てき) (adj-na,suf) -like; typical; (まと) (n) mark; target; SP
# 役割 【やくわり】 (n) part; assigning (allotment of) parts; role; duties; (P); EP

which is what you expect.

One thing that can cause 役割 to be broken up is the presence of some
whitespace characters between them in the original. I try and ignore
them, but some thing like a "<br>" in the original might throw it.

>  Any chance you could make a word file ordered by size (largest first)
>  and then scan sentences for matches?  Might make this particular bug
>  go away by matching largest first.                  

WWWJDIC does use a form of longest-first already.

Jim

-- 
Jim Breen
Honorary Senior Research Fellow
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
http://www.csse.monash.edu.au/~jwb/