[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] xrefs in WWWJDIC

To: edict-jmdict@***************
Subject: Re: [edict-jmdict] xrefs in WWWJDIC
From: "Jim Breen" <jimbreen@*********>
Date: Sat, 26 May 2007 17:44:05 +1000

On 25/05/07, Stuart McGraw <smcg4191@frii.com> wrote:

Jim Breen wrote:
 > [pos="n,vs"]
 > [sense] bending
 indentation
 > [sense] refraction
 > [sense] inflection

 Or how about a syntax more familiar to wwwjdic/edict users?:

 (1) [n,vs] bending; indentation
 (2) refraction
 (3) inflection


Hmmm. I don't like the idea of locking ";" in to only meaning a gloss-
separator, but I guess it could be made escapable, e.g. use \; to put
";" within a gloss.

Starting a gloss with "([0-9])" may be safe (it doesn't happen at present),
but I'd hate to rule it out. Maybe [1] would be better.

 I don't think a pure edict format is good because of ambiguities
 but it might be possible to come up with something very similar
 that is unambiguously parsable and is not too rigid.

 If no glosses can start with "[" then brackets could be used
 to identify sense tags, as above.  The misc, pos, and field
 tags are all unique across all three groups so could be freely
 intermixed, with or without their own sets of brackets:

 [n,vs][col][comp]
    [n,vs,col,comp]
    [col,vs][comp][n]

 are all unambiguous and would free the submitter from needing
 to remember, "pos tags go before misc tags"..  Of course this
 would need a commitment to continue to maintain such uniqueness
 among the tags which may not be desirable.  The misc and pos
 tags seem unlikely to grow much but the field tags could.


Yes, at present the PoS, misc, etc. tags don't overlap, but I doubt
that could, or should, be maintained forever. I rather like the
[name="values"] approach.

 Some syntax is needed for stagr, stagk info.  Maybe some
 Japanese text and the word "only" inside brackets is enough.


How about [restr="かなことば"] or [restr="漢字言葉"] ?

 One ambiguity in edict is between s_info and gloss:
 Is "parenthesized text" in

 (1) (parenthesized text) gloss

 part of the gloss as in

 [1000610] "いい年をして 【いいとしをして】 (exp) (in spite of) being old enough to know better"

 or a sense.s_info comment as in:

 [1565020] "吮癰舐痔 【せんようしじ】 (exp) (Chinese four-character phrase) brown-nosing;...

 So that would need working out.


いい年をして wouldn't change, as the "(in spite of)" is part of the gloss.
For the 吮癰舐痔 entry, you could have:

[pos="exp",sensinf="Chinese four-character phrase"] brown-nosing;.....

<sens_inf> is a bit of a hack. I wanted a field where text containing
Japanese could go, but which would not carry through to the EDICT
form (which can't have Japanese in the "English" region.

 > This is a bit like Wikimedia's style. It may well be better than having a myriad
 > of boxes.

 I wondered about a parsed text approach because it is already
 done to some degree on the current wwwjdic new/amend form
 (e.g. sense numbers to distinguish different senses), and I can't
 think of a good way to provide separate boxes with having, as
 you say, a myriad of them, most of which will never be used,
 or some fancy javascript/ajax or similar.  The latter could do
 things like hiding boxes or providing a new box (e.g. kanji) when
 the existing box was filled out.  But that approach would probably
 take me about 3 years to implement. :-)


Pavel's Ruby-on-Rails prototype had all those hidden fields that
opened up when clicked on. Effective, but a bit scary.

I'm a bit torn on this. I suspect having PoSs, etc. as parseable
thingos inside a text box is likely to be the fastest way to get
a working system. OTOH when Ilet users type in their own PoS codes
from a supplied list, I got all sorts of garbage. Only the drop-down
PoS lists fixed that. Still, if the iput is parsed and user gets an
immediate response with errors highlighted, it may well work.

So, do we proceed with a 3-text-box model? I'd be prepared to give it a go.

Jim
--
Jim Breen
Honorary Senior Research Fellow
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
http://www.csse.monash.edu.au/~jwb/

Follow-Ups:
- RE: [edict-jmdict] xrefs in WWWJDIC
  - From: "Stuart McGraw" <smcg4191@********>

References:
- Re: [edict-jmdict] xrefs in WWWJDIC
  - From: "Jim Breen" <jimbreen@*********>
- RE: [edict-jmdict] xrefs in WWWJDIC
  - From: "Stuart McGraw" <smcg4191@********>

Prev by Date: Re: code to parse/format edict format
Next by Date: Re: [edict-jmdict] Re: code to parse/format edict format
Previous by thread: RE: [edict-jmdict] xrefs in WWWJDIC
Next by thread: RE: [edict-jmdict] xrefs in WWWJDIC
Index(es):
- Date
- Thread