[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] <xref> tags without a destination or with too many destinations

To: <edict-jmdict@***************>
Subject: Re: [edict-jmdict] <xref> tags without a destination or with too many destinations
From: Jean-Luc Léger <jean-luc.leger@*********>
Date: Thu, 04 Nov 2010 14:07:48 +0100

On Wed, 03 Nov 2010 20:32:42 -0600, Stuart McGraw <smcg4191@frii.com>
wrote:
> On 11/03/2010 01:43 PM, Jean-Luc Léger wrote:
>>[...]
>> Would be glad to see those sense Ids in the Jmdict file.
>> Be it a small number, a big number or a string : I don't care.
> 
> I'm interested in how you and Glen (or anyone else) would use them.
> The reason I ask is because I thought a bit about adding them in
> the database long ago but decided then not to because a sense doesn't 
> seem to have much of an identity -- it is really just a container 
> for glosses (and maybe a sense note) and as such, one container 
> is as good as another.  I had not then realized that they do have
> identities by virtue of being pointed to by xrefs.  But I wonder
> if there are any other reasons?  

Well, take the Tanaka Corpus file. There are links to entries through
headwords and to senses through ordinal position of the sense in its entry.
The big problem is that those ordinal positions may vary : a new sense can
be inserted, or senses can be reordered, without changing the content of
the refered sense.
So it must need a lot of maintenance.

JMDictDB has the same problem with Xref to a specific sense. I have just
made a small test in the test database :
Before the test :

タイム [ gai1,ichi1]
1. 	[n]
▶ time
References to this sense:
  	see: 2083020 たんま 1.(children's language) to interrupt a game;time out
2. 	[n]
▶ thyme
References to this sense:
  	see: 2187990 立麝香草 【 タチジャコウソウ 】 1.thyme
  	see: 2188000 木立ち百里香 【 きだちひゃくりこう 】 1.thyme 

and 
立麝香草
【 タチジャコウソウ ( nokanji ) ； たちじゃこうそう 】
1. 	[n] [uk]
▶ thyme
Cross references:
  	see: 1076010 タイム 2.thyme 

Now what happen to those Xref when I add a new sense to タイム in first
position (so moving sense 1 to 2 and sense 2 to 3) ?
The answer is here :

http://www.edrdg.org/~smg/cgi-bin/entr.py?e=1032953&svc=jmtest&sid=
http://www.edrdg.org/~smg/cgi-bin/entr.py?svc=jmtest&sid=&e=1032951

タイム [ gai1,ichi1]
1. 	[n]
▶ new sense for testing
References to this sense:
  	see: 2083020 たんま 1.(children's language) to interrupt a game;time out  
            <----- wrong sense
2. 	[n]
▶ time
References to this sense:
  	see: 2188000 木立ち百里香 【 きだちひゃくりこう 】 1.thyme                              
 <----- wrong sense
  	see: 2187990 立麝香草 【 タチジャコウソウ 】 1.thyme
3. 	[n]
▶ thyme

立麝香草
【 タチジャコウソウ ( nokanji ) ； たちじゃこうそう 】
1. 	[n] [uk]
▶ thyme
Cross references:
  	see: 1076010 タイム 2.time                                                
           <----- wrong sense

As you see, inserting a new sense (or just changing sense order) means you
need to correct all the references to the reordered senses.
I think this should be done by the application, not by the users (or
editors).
Would you use Sense Ids internally, you wouldn't have this problem.

I mean something like this :
In the Sense table :
- EntryId 
- SenseId
- Order

the Primary Key would be EntryId + SenseId
a unique index on EntryId + Order would be needed too

In the xref table, record EntryId + SenseId (in place of EntryId + Order)

Just like EntryIds, a SenseId should not be displayed. It's created when a
new sense is inserted.
I have seen Jim's mail about SenseId and I am not too keen about having
users typing them.

Now, I know the big problem with SenseIds is how to match senses from a
user amendment with the recorded senses. It's easy when the whole sense
hasn't been changed, but it may become overtly complex when it has. I think
of cases where senses are split and/or merged.
I have had those problems when I tried to maintain my own database with
JMdict, and I suppose Ben is having the same kind of problems with his.

 JL

  JL

References:
- <xref> tags without a destination or with too many destinations
  - From: Ben Bullock <benkasminbullock@*********>
- Re: [edict-jmdict] <xref> tags without a destination or with too many destinations [1 Attachment]
  - From: Glenn Maynard <glenn@********>
- Re: [edict-jmdict] <xref> tags without a destination or with too many destinations
  - From: Stuart McGraw <smcg4191@********>
- Re: [edict-jmdict] <xref> tags without a destination or with too many destinations
  - From: Glenn Maynard <glenn@********>
- Re: [edict-jmdict] <xref> tags without a destination or with too many destinations
  - From: Stuart McGraw <smcg4191@********>
- Re: [edict-jmdict] <xref> tags without a destination or with too many destinations
  - From: Jean-Luc Léger <jean-luc.leger@*********>
- Re: [edict-jmdict] <xref> tags without a destination or with too many destinations
  - From: Stuart McGraw <smcg4191@********>

Prev by Date: Re: [edict-jmdict] <xref> tags without a destination or with too many destinations
Next by Date: Re: [edict-jmdict] <xref> tags without a destination or with too many destinations
Previous by thread: Re: [edict-jmdict] <xref> tags without a destination or with too many destinations
Next by thread: Re: [edict-jmdict] <xref> tags without a destination or with too many destinations
Index(es):
- Date
- Thread