[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] <xref> tags without a destination or with too many destinations
On Wed, 03 Nov 2010 20:32:42 -0600, Stuart McGraw <smcg4191@frii.com>
wrote:
> On 11/03/2010 01:43 PM, Jean-Luc Léger wrote:
>>[...]
>> Would be glad to see those sense Ids in the Jmdict file.
>> Be it a small number, a big number or a string : I don't care.
>
> I'm interested in how you and Glen (or anyone else) would use them.
> The reason I ask is because I thought a bit about adding them in
> the database long ago but decided then not to because a sense doesn't
> seem to have much of an identity -- it is really just a container
> for glosses (and maybe a sense note) and as such, one container
> is as good as another. I had not then realized that they do have
> identities by virtue of being pointed to by xrefs. But I wonder
> if there are any other reasons?
Well, take the Tanaka Corpus file. There are links to entries through
headwords and to senses through ordinal position of the sense in its entry.
The big problem is that those ordinal positions may vary : a new sense can
be inserted, or senses can be reordered, without changing the content of
the refered sense.
So it must need a lot of maintenance.
JMDictDB has the same problem with Xref to a specific sense. I have just
made a small test in the test database :
Before the test :
タイム [ gai1,ichi1]
1. [n]
▶ time
References to this sense:
see: 2083020 たんま 1.(children's language) to interrupt a game;time out
2. [n]
▶ thyme
References to this sense:
see: 2187990 立麝香草 【 タチジャコウソウ 】 1.thyme
see: 2188000 木立ち百里香 【 きだちひゃくりこう 】 1.thyme
and
立麝香草
【 タチジャコウソウ ( nokanji ) ; たちじゃこうそう 】
1. [n] [uk]
▶ thyme
Cross references:
see: 1076010 タイム 2.thyme
Now what happen to those Xref when I add a new sense to タイム in first
position (so moving sense 1 to 2 and sense 2 to 3) ?
The answer is here :
http://www.edrdg.org/~smg/cgi-bin/entr.py?e=1032953&svc=jmtest&sid=
http://www.edrdg.org/~smg/cgi-bin/entr.py?svc=jmtest&sid=&e=1032951
タイム [ gai1,ichi1]
1. [n]
▶ new sense for testing
References to this sense:
see: 2083020 たんま 1.(children's language) to interrupt a game;time out
<----- wrong sense
2. [n]
▶ time
References to this sense:
see: 2188000 木立ち百里香 【 きだちひゃくりこう 】 1.thyme
<----- wrong sense
see: 2187990 立麝香草 【 タチジャコウソウ 】 1.thyme
3. [n]
▶ thyme
立麝香草
【 タチジャコウソウ ( nokanji ) ; たちじゃこうそう 】
1. [n] [uk]
▶ thyme
Cross references:
see: 1076010 タイム 2.time
<----- wrong sense
As you see, inserting a new sense (or just changing sense order) means you
need to correct all the references to the reordered senses.
I think this should be done by the application, not by the users (or
editors).
Would you use Sense Ids internally, you wouldn't have this problem.
I mean something like this :
In the Sense table :
- EntryId
- SenseId
- Order
the Primary Key would be EntryId + SenseId
a unique index on EntryId + Order would be needed too
In the xref table, record EntryId + SenseId (in place of EntryId + Order)
Just like EntryIds, a SenseId should not be displayed. It's created when a
new sense is inserted.
I have seen Jim's mail about SenseId and I am not too keen about having
users typing them.
Now, I know the big problem with SenseIds is how to match senses from a
user amendment with the recorded senses. It's easy when the whole sense
hasn't been changed, but it may become overtly complex when it has. I think
of cases where senses are split and/or merged.
I have had those problems when I tried to maintain my own database with
JMdict, and I suppose Ben is having the same kind of problems with his.
JL
JL