[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Abbreviations (Was: Combining entries)



Picking up this from a couple of days ago, as I think it will
eventually get somewhere useful.

On 14 July 2010 22:54, Stuart McGraw <smcg4191@frii.com> wrote:
> Right now, tags consist of a tag-type (pos, misc, fld, etc)
> and a value (v5u, col, comp, etc.) and one can give both
> explicitly: [pos=v5u]. ....

[useful background trimmed]

> The only (I think) exceptions are "see=" and "ant=". There is only
> really one "xref" tag-type (just as there is one "lsrc" tag-type)
> and "see" and "ant" (along with the referenced entry) are really
> just tag-values just as "fre" or "spa" (along with the lsource text)
> are values of an "lsrc" tag. But because they are so common and likely
> to be used in any installation of jmdictdb, it seemed reasonable to
> make them into pseudo-tag-types. Note the being able to use [see=...]
> is not analogous to using [v5u] rather than [pos=v5u]. For the
> "v5u" case, the parser doesn't need to know about the token "v5u" --
> it simply searches for it in the keyword tables when it encounters
> it.
>
> But knowledge about "see" and "ant" are hardwired into the parser
> (just as knowledge about "pos" and "misc" and anything else used
> on the left side of the "=" in a tag like [pos=v5u].) So it is
> desireable to not have an an ubounded set of tokens that can
> be used as tag-types

Thanks. And what is really being suggested is that the
hard-wired exceptions of [see=...] and [ant=...] be joined
by [abbr=...].

> It may be true that it seems now that only a "few" values will
> ever be be needed for these xref pseudo-tage-types but when I
> was thinking about how to use xrefs, I could think of many possible
> xref types that could be useful. A recent email discussion with
> Jim and James Rose discussed the possibility of using xrefs to
> associate a entry for a biological species with the entry for its
> family or genus. When you also consider that the database may well
> contain other dictionaries and corpora in addition to JMdict, the
> possibilities for the usefulness of other kinds of xrefs increases.
> Finally, because xrefs types are intended to be user-definable by
> simply adding entries to the database kwxref table, the parser
> needs to be able to parse them without having hardwired knowledge
> about what xref types are available.

I can see that others may be proposed, but I feel:

(a) there is a really strong case for a simple double-headed "abbr"
link between entries, one of which is an abbreviation of each other;

(b) the need is such that it justifies the extension of the see/ant
exceptions to include it. We want to encourage people to enter them.

(c) other xref types can probably be handled by the general
[xref=type:value] construct, as they are more likely to be entered by
skilled users.

[...]

> I'll close by saying despite all the above I am not set on the syntax
> I proposed, nor unmovingly opposed to an [abbr=...] tag -- I just wanted
> to point out some of the factors that need to be considered.

Appreciated.

If we can "go forward" with this (to use our newish Prime Minister's
catch-phrase), I'd be looking for the JMdict xml to have something like
<abbr>ナニナニ</xref> (which in the long run may morph into
<xref type="abbr">ナニナニ</xref> or <xref type="abbr" value="ナニナニ"/>)

In EDICT2 this would become "(abbr) (See ナニナニ)" as now or possibly
"(See ナニナニ(abbr))"  Either way the ナニナニ would be turned into
a hyperlink in WWWJDIC.

Cheers

Jim

--
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Treasurer: Hawthorn Rowing Club, Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne