[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: parsing submission data (was: [edict-jmdict] xrefs in WWWJDIC)



On 03/06/07, Stuart McGraw <smcg4191@frii.com> wrote:
Jim Breen wrote:
 > Actually [[s_lang="en: the source word"]]  would probably be better, just to
 > emphasise to the user that the language code is needed.
 >
 > "from" may well be better than "s_lang", but it might confuse with "trans"
 > and "lit". (Many people get those confused.)

 The last time this came up,
 http://tech.groups.yahoo.com/group/edict-jmdict/message/1503,
 http://tech.groups.yahoo.com/group/edict-jmdict/message/1524
 you were considering making trans a sense.lsource
 attribute, e.g.
   <lsource lang="en" translit="soapland">
 rather than a tagged gloss

Well, I had __ls_type="wasei"__ in there, as "soapland"
isn't meaningful English...

   <gloss lang="en" translit>soapland</gloss>
 or gloss-like element
   <translit lang="en">soapland</translit>
 (BTW, I think the last two are informationally equivalent
 and would have identical representations in the database.)

That's still my thinking. I don't want a real gloss to contain broken or
non-English. The 和製英語 from which a loanword is transliterated is of
interest, but it must be seen as informtion relating to a sense; not a
translation of the Japanese word itself.

 If you did go with the first, then it would come down to
 the teaching people:
 1. Use [from....] to specify the foreign language word
     or pseudo-word a Japanese word was derived from.
   1a.  Use [from ... trans*] when that word is not a real
     word in the source language.

Yes.

 2. Use a gloss to provide the meaning of the Japanese
     word in English (or the specified gloss language).
     May or may not be the same as the word given in
     [from:...] (but will never be the same as [from...trans].)

Yes, but there's no need for a [from="..."] if the 外来語 is from a
real English/etc. word or phrase.

 3. Use [lit] for gloss that is an unusually word-for-word
     translation (but still a legitimate word/whatever in the
     gloss language)   [This is not expressed very clearly
     but you get the idea I hope.]

Mostly. Take the recent entry: 鬼の居ぬ間に洗濯. It contains:
"(lit: refreshing oneself while the ogre is gone)." As a gloss
that is uselessly literal, and should not be regarded as a gloss at all,
but more some information about the background/structure of the Japanese
phrase (most occurrences of "lit:" are for expressions like that.)

 I suspect that documenting and teaching people this will
 be easier than teaching them some other things, like when
 a gloss goes in an existing sense and when it is a new sense.

I think so too.

For a learnable syntax, how about:

[from="fr: avec"] or just [from=it:] in the case of フェットチーネ.
and
[wasei="soaplady"] or [wasei="de: gebroken Deutsch"]
and
[lit="refreshing oneself while the ogre is gone"]

In fact the first two will end up as the same XML entity, but with
different attributes,

Cheers

Jim

--
Jim Breen
Honorary Senior Research Fellow
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
http://www.csse.monash.edu.au/~jwb/