[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Combining entries



[Picking up a point or two from some days back...]

On 13 July 2010 20:21, Francis Bond <bond@ieee.org> wrote:
> Didn't we at some dim dark time in the past agree to treat these more like:
>
> "Fabaceae/pea family of plants (gloss)" or possibly (expl)?
>
> That way we keep a clear distinction between translation
> equivalents and explanations, which is useful for MT users
> and reverse look up, among other things.

I know we settled on making acronyms & initials a distinct gloss.
We also discussed tagging glosses as to whether they were translations
or explanations, but there wasn't the means at that stage to
implement something like that.

I've been comfortable with "translation (explanation)" since in
WWWJDIC at least things in parentheses are excluded from
"exact match" searches.

But I think the bigger exploration of this issue has yet to be
completed.

> What do people think of adding more external links (to e.g.
> Wikipedia and Wordnet/MLSN) for people that want more detailed
> information?    This would involve treating the links as first class
> entities so that we can correct them (link the right sense to the
> right external thing).  By taking this approach those with grandious
> visions of a dictionary with all possible information (like me) to
> get closer to it, albeit as a distributed resource, while allowing
> JMDict to keep a reasonable size.  Both Wikipedia and WordNet
> welcome collaboration (^_^).
>
> I think there is also a practical (aesthetic?) issue in adding
> very detailed information to JMDict senses --- at the moment
> most entries have roughly the same amount of information and
> I think that this makes it easier to use.  If we start making some
>  entries very detailed then we lose the sense of overall balance.

I like the two-tier approach, with links to more detailed information. I
even allowed a structure for it in the DTD back in 1998/99:

<!ELEMENT links (link_tag, link_desc, link_uri)>
<!ELEMENT link_tag (#PCDATA)>
<!ELEMENT link_desc (#PCDATA)>
<!ELEMENT link_uri (#PCDATA)>
        <!-- This element holds details of linking information to
	entries in other electronic repositories. The link_tag will be
	coded to indicate the type of link (text, image, sound), the
	link_desc will provided a textual label for the link, and the
	link_uri contains the actual URI.  -->

It never went anywhere, and I have ended up dealing with it inside
WWWJDIC, where I add links to the sound clips, Japanese Wikipedia, jeKai,
etc.

Some of those are still best left to be done at the user/server level,
but having the capability of including links at the entry sense and
perhaps even gloss level is worth exploring.

Jim

--
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Treasurer: Hawthorn Rowing Club, Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne