[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: parsing submission data (was: [edict-jmdict] xrefs in WWWJDIC)



Jim Breen wrote:
> On 02/06/07, Stuart McGraw <smcg4191@frii.com> wrote:
> > Jim Breen wrote:
> >  > On 01/06/07, Stuart McGraw <smcg4191@frii.com> wrote:
[...]
> >  > >  Currently for lsource, which is not a simple tag, I use a dedicated
> >  > >  syntax:
> >  > >     [from en: the source word]
> >  >
> >  > which your parser didn't like.
> >
> >  That's because it's parsing your proposed grammar, not mine.
> >
> >  > >  where the parser treats "[from" (or more accurately case insensitive
> >  > >  "[\s*from") as a token.  "the source word" is an arbitrary text string
> >  > >  not containing "]".
> >  >
> >  > But something like [[s_lang="en the source word"]] is doable?
> >
> >  Sure.  Easier actually.  I thought the other syntax might be easier
> >  for users because the keyword "from" is more mnemonic than the
> >  somewhat obscure "s_lang", the language in marked with a colon
> >  (a convention familiar from edict,wwjdic etc), and no quotes are
> >  needed.
> 
> Actually [[s_lang="en: the source word"]]  would probably be better, just to
> emphasise to the user that the language code is needed.
> 
> "from" may well be better than "s_lang", but it might confuse with "trans"
> and "lit". (Many people get those confused.)

The last time this came up, 
http://tech.groups.yahoo.com/group/edict-jmdict/message/1503,
http://tech.groups.yahoo.com/group/edict-jmdict/message/1524
you were considering making trans a sense.lsource 
attribute, e.g.
  <lsource lang="en" translit="soapland">
rather than a tagged gloss 
  <gloss lang="en" translit>soapland</gloss>
or gloss-like element 
  <translit lang="en">soapland</translit>
(BTW, I think the last two are informationally equivalent 
and would have identical representations in the database.)

If you did go with the first, then it would come down to
the teaching people:
1. Use [from....] to specify the foreign language word 
    or pseudo-word a Japanese word was derived from.
  1a.  Use [from ... trans*] when that word is not a real
    word in the source language.
2. Use a gloss to provide the meaning of the Japanese 
    word in English (or the specified gloss language).
    May or may not be the same as the word given in 
    [from:...] (but will never be the same as [from...trans].)
3. Use [lit] for gloss that is an unusually word-for-word
    translation (but still a legitimate word/whatever in the
    gloss language)   [This is not expressed very clearly
    but you get the idea I hope.]

* -- Need to find better syntax.

Is that about right or am I (likely I'm afraid) among 
the confused?  

I suspect that documenting and teaching people this will 
be easier than teaching them some other things, like when
a gloss goes in an existing sense and when it is a new sense.