[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Non-ambiguous cross-references (xref/ant): ent_seq?



Jim,

Thank you for the reply. Great to know that this will (eventually) get fixed. Having both entry ID and sense number in the attributes seems like a good solution.

Looking forward to it:-)

Adam


Thanks for raising this. There is discussion going on among the JMdict
editors about revisions to the structure, and I'll make sure this issue gets fixed as part of it. I've known about for the problem for years but I admit
it's not been on my fix-it list. It is now.

At present the JMdict format handles cross-references in the following form
(quoting entry 1262990):
<xref>スライド・1</xref>

I envisage it moving to something like:
<xref type="see" seq="1073760" sno="1">スライド・1</xref>

That should enable apps using JMdict to do the linkage correctly and
allow derived formats such as EDICT to continue as before.

It may be quite a while before the new structure is available. I was planning to make a call for suggestions via this list shortly. I also think we should warn app developers well in advance. Something that could be explored is parallel production of "old" and "new" JMdicts for a transition period.

Cheers

Jim

On Tue, 30 Jul 2019 at 01:42, 'Adam Nohejl' adam@nohejl.name
[edict-jmdict] <edict-jmdict@yahoogroups.com> wrote:



Hi,

I would like to use JMDict as one of the data sources for a dictionary app. I have noticed that some of the cross-references between entries (xref/ant elements) cannot be resolved without ambiguity:

A few examples:

<xref>何れ・1</xref> may refer to ent_seq 1009290, 1566210 (both have multiple senses) <xref>因・2</xref> may refer to ent_seq 1168640, 1541760, 1922790 (all have multiple senses) <xref>駆ける・かける・1</xref> may refer to ent_seq 1244720, 1570710 (and the reading does not help to disambiguate)

There are dozens more. I have also noticed that a popular iOS app Imiwa that uses JMDict often points the user to the wrong entry, so it's not just my problem.

I can see from the web search that you internally refer to the ent_seq IDs, but it's missing from the available XML file:

<xref type="see" seq="1009290">何れ・1</xref>
http://www.edrdg.org/jmdictdb/cgi-bin/entr.py?svc=jmdict&sid=&disp=xml&e=2006359

Is there any chance to get a file with non-ambiguous references?

Best regards,

--
Adam



--
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
http://www.jimbreen.org/
http://nihongo.monash.edu/

--
Adam Nohejl