[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] [abbr=...] (Abbreviation cross-references?)
I'll top-post for brevity.
First, in general I think this is a good idea. A problem is that
we have a huge backlog of good ideas, and implementing
them means:
(a) Stuart has to squeeze in the time to develop and test the
changes;
(b) I have to do the same for the conversion routines that feed the
EDICT formats that are widely used
(c) the database needs to be converted
(d) a coordinated cutover needs to happen.
I guess the ideal would be to have a Database Mk 2 effort
with a batch of changes, because the coordination, documentation,
etc. effort alone is hardly worth it for just one feature.
Apropos of the "[abbr=...]", idea:
(a) I like it, *especially* if the XML coming out of it lets me do
EDICT2-style "(See XXXX)" in the one pass.
(b) we are probably pretty close to it anyway.
Consider the (EDICT2) entry: "齧歯 [げっし] /(n) (See 齧歯動物) (abbr) rodent/"
In JMdict at present that is:
...
<xref>齧歯動物</xref>
<misc>&abbr;</misc>
....
If you look at Stuart's "jmdictdb.xml" format (link at the foot of the
database display) it has:
...
<xref type="see" seq="2444900">齧歯動物</xref>
<misc>&abbr;</misc>
....
It would not be a massive change to flip that to:
<xref type="abbr" seq="2444900">齧歯動物</xref>
and I could carry that into JMdict without a lot of pain.
And the EDICT generation would be straightforward.
All of this is dependent on Stuart (and me) having the time
for a lot of coding and testing. I can squeeze out some,
but I don't know how Stuart stands. What I really would like
though is to get a batch of improvements done at the one
time.
Cheers
Jim
On 28 August 2011 23:44, Nils Roland Barth <jdict.nbarth@xoxy.net> wrote:
> I recently inquired (in private correspondence) about
> whether it would be possible to extend or augment the
> existing [abbr] tag with a cross-reference (e.g. [abbr=◯◯◯◯])
> to link from an abbreviated form to the full form
> (this has come up pretty frequently in my submissions).
>
> Following some discussion, it was suggested to move this to
> the list.
>
> AFAICT:
> * an [abbr=] tag (or such) should be pretty easy,
> * some thought about longer-term (esp. other cross-reference types)
> is advised.
>
> Concretely, it would help my editing and using, and some
> others have suggested this, so I would certainly appreciate it!
>
>
> To recap earlier discussion:
> This subject was previously discussed in two threads/contexts last year,
> AFAICT:
>
> * Changing database format
> Changing entities to attribute values
> http://tech.groups.yahoo.com/group/edict-jmdict/message/3561
> Glenn Maynard, Wed Jan 13, 2010 7:05 am
> revived in:
> http://tech.groups.yahoo.com/group/edict-jmdict/message/3753
> Jim Breen, Mon Jun 7, 2010 11:35 pm
>
> * Merging closely entries (including abbreviations)
> Abbreviations (Was: Combining entries)
> Jim Breen, Tue Jul 13, 2010 1:37 am
> http://tech.groups.yahoo.com/group/edict-jmdict/message/3972
>
> The most [abbr=] focused entry seems to be this:
> http://tech.groups.yahoo.com/group/edict-jmdict/message/4011
> (…and refs).
>
> …and Stuart gave a summary/links to earlier discussions here:
> http://tech.groups.yahoo.com/group/edict-jmdict/message/4100
>
>
> Similar topics are mentioned on the old wish list:
> http://www.csse.monash.edu.au/~jwb/edictredev/edictwishlist.html
> Revise the XML tagging of xrefs, ants, etc.
>
>
> I don’t understand the technicals terribly well, but as I
> understand it, two things would be involved:
> * Adding another kind of cross-reference
> (which is easy)
> * Choosing some syntax
> (e.g. [xref=abbr:...])
> (currently [see=] and [ant=] are special-cased)
>
> It would also be nice to automatically detect existing
> [abbr]/[see=] entries and convert these to [abbr=...]
> which should be relatively easy.
> (Rather, should be v. easy to find these,
> and some manual checking to avoid false positives.)
>
>
> Two technical concerns were raised, AFAICT:
> * Cross-reference syntax moving to new seq=... references,
> rather than old kanji/kana look-up.
> I think this prevented further work (back in 2010), b/c
> changing refs took priority, but AFAICT this has now been done?
>
> * Future cross-reference types – potentially many more,
> e.g., links from species to genus, links to counter words, etc.
>
>
> As per Stuart, these changes (adding another xref type) is quite easy.
> It also seems like a quick win (useful, easy),
> and easy for average user to understand,
> so I’ve a few questions for the list:
>
> 1. Does adding an [abbr=...] (or [xref=abbr:...]) tag sound good?
> 2. Is it technically straight-forward?
> 3. Re: different xref types, what should we consider in future?
> (an [xref=abbr:...] style syntax in general is fine,
> for typing/parsing ease an [abbr=] shortcut would be nice.)
> 4. Is it worth the effort?
> (are there more pressing matters –
> should this be put off for another year or so?)
>
> cheers,
> ~nils
>
>
> ------------------------------------
>
> Yahoo! Groups Links
>
>
>
>
--
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Webmaster: Hawthorn Rowing Club, Treasurer: Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne