[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Re: A few changes



[phil_ronan ([edict-jmdict] Re: A few changes) writes:]
>> Coincidentlly, I noticed the same problem in the entry for
>> キャロット (=en:carrot/fr:calotte) a couple of days ago. I originally
>> considered moving the <lang> tags to the <sense> sections or adding
>> etymology restriction tags (like <ke_restr> and <re_restr>), but it
>> doesn't seem that either of these approaches would be particularly
>> workable.

Phil sent in a message along these lines via an amendment form, so I
suggested he join the list.

>> I just came across Jim's solution this morning. As I understand
>> it, it should look something like this:
>> 
>> Plan "A"
>> ========
>> 
>> Add "esense" and "src_lang" attributes to the <etym> tags:
>> 
>> <!-- Plan "C" -->
>> <entry>
>>   <r_ele>
>>     <reb>キャロット</reb>
>>   </r_ele>
>>   <info>
>> *   <etym esense="1" src_lang="en">carrot</etym>

I would not have that there. Since the default case for a 外来語
is that it comes from English and means much the same as the 
English, I'd leave that as implied.

>> *   <etym esense="2" src_lang="fr">calotte</etym>
>>   </info>
>>   <sense>
>>     <gloss>carrot</gloss>
>>   </sense>
>>   <sense>
>>     <gloss>calotte (type of hat)</gloss>
>>   </sense>
>> </entry>
>> 
>> Pros: I assume this would mean getting rid of the <lang> tags
>>  altogether, which makes sense --- the source language is part of
>>  the etymology, after all, so there's no need for both tags.

I agree.

>> Cons: Is it just me, or is anyone else uncomfortable with those
>>  forward references?

I'm comfortable with it. After all, assuming the English stays implied,
there is only the need for an esense attribute if there are multiple senses
and the source language word is retstricted to a subset of them.

>> I'm beginning to wonder if it might make more sense to put words
>> with different etymologies into separate entries:

>From the point of database tidiness it makes sense, but from a working 
dictionary point of view, I'd prefer to get one entry when I ask
for ウエスト. 

Also, as Paul pointed out, the coupling between
EDICT and the Tanaka corpus is at the headword level. We have partially 
marked up the word indices attached to the corpus to indicate sense (by 
number). From a pragmatic point of view I'd hate to see all that work
blown away, and a heap of rather tricky mods made to wwwjdic, etc. just to
get a more elegant solution to the representation of a minority of
外来語.


>> Plan "B"
>> ========
>> Pros: No need to change the DTD. Since the words are presumably
>>  unrelated, it would make sense to provide separate <info>
>>  sections for each. For example, I think we can assume that a
>>  bibliography entry for キャロット/calotte would be of little value
>>  in the dictionary definition of キャロット/carrot.

>> Cons: Redundancy, perhaps?

Multiple display of entries.
Editing needed to move from the present structure
Breaking of a currently working wwwjdic/examples links.
I'd need to recombine them for glossing purposes (glossdic)

>> I'd be interested to hear what other people think.

Thanks for the comments & suggestions.

Jim

-- 
Jim Breen                                http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology,               Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia                  Fax: +61 3 9905 5146
(Monash Provider No. 00008C)                ジム・ブリーン@モナシュ大学