[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] Abbreviations (Was: Combining entries)
2010/7/19 Stuart McGraw <smcg4191@frii.com>:
> On 07/16/2010 09:20 PM, Jim Breen wrote:
>> (a) there is a really strong case for a simple double-headed "abbr"
>> link between entries, one of which is an abbreviation of each other;
>
> By "double-headed" do you mean two xrefs, one in each entry pointing
> to the other?
I meant both.
>> (b) the need is such that it justifies the extension of the see/ant
>> exceptions to include it. We want to encourage people to enter them.
>>
>> (c) other xref types can probably be handled by the general
>> [xref=type:value] construct, as they are more likely to be entered by
>> skilled users.
>
> I'm not sure I find the above all that convincing: people enter extra
> characters now for the sake of consistent syntax and to maintain a
> table-driven design (as opposed to hardwiring knowledge of every tag
> into the parser). I've been reading the edict list for a number of years
> and, unlike "synonym", don't think I've ever seen mention of an "abbr"
> xref until a few days ago. I don't see any reason to believe that a
> few months from now there won't be another xref type that is so needed
> that it too has to be hardcoded into the parser.
That's a fair point. Also I've been looking at the places where this may apply.
There are lots of entries like:
カンペ /(n) (abbr) (See カンニングペーパー) large sketchbook used .....
It should be quite feasible to extract a set of these and create SQL
sequences to convert them into a new style.
[...]
>> If we can "go forward" with this (to use our newish Prime Minister's
>> catch-phrase), I'd be looking for the JMdict xml to have something like
>> <abbr>ナニナニ</xref> (which in the long run may morph into
>> <xref type="abbr">ナニナニ</xref> or <xref type="abbr" value="ナニナニ"/>)
>
> I am discouraged to read this. As you'll recall, I have have been
> advocating for several years that the XML for cross refs give the
> type as an attribute and include an explicit mention (also as an
> attribute) of the referenced entry's sequence number. (See for
> example this April 2007 post:
> http://tech.groups.yahoo.com/group/edict-jmdict/message/1490)
>
> Rather than adding a new element that will later need to be changed
> again to the more general
> <xref type="abbr" seq="nnnnnnn">ナニナニ</xref>
> (or similar) form, ISTM that it would make sense to make the change
> to the latter form now. Either all xrefs could be changed to this
> form now, or for backward compatibility, it could be used only for
> the new abbr xrefs with <see> and <ant> remaining but growing a
> "seq="nnnnnnn" attribute. (I believe that the common convention in
> the xml/html world of ignoring unknown attributes would cause this
> change to introduce at most only a very small amount of backward
> incompatibility.)
That makes sense to me. So hold off until JMdict moves to a revised
DTD and set of entities. (I quite agree about getting the sequence number
explicitly into the xrefs.)
I'll put this into the wishlist.
jim
--
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Treasurer: Hawthorn Rowing Club, Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne