[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] "P" Markers - Google as corpus?



So the black triangle is by no
means an indicator of the way it is most commonly written.
[...]
As you said, you'd have to be in a real
pinch to go with this rule.

Personally, I prefer to think of 終焉 and 菠薐草 as the
"authoritative" versions of the words,

I think, if anything, the 公用文 style would be the
authoritative version while 終焉/菠薐草 would be the _academic_
version.  Not that there's anything wrong in having that
in a dictionary.

and I think that these should be listed as the "primary*
headwords in edict, but, as I mentioned before, I *definitely*
think that edict should contain both the complete kanji
headwords and the 公用文 style, which could probably be
generated (relatively) easily by someone with the proper
Perl/Python skills.

I agree with Stutzman, in that I don't see (P) tags as having
anything to do with the "primary" headwords.  So for example, a word
I recently submitted an amendment to (葡萄糖, "glucose") is, in
my experience, *always* written asブドウ糖 on food packages, and
has something like a 7:1 Google hit ratio compared to the kanji-only
form.  I would suggest that 葡萄糖 remain the primary headword
(after all, it's the one you'll find in the dictionary),

As always, I'm impressed by the extent of circular reasoning
in the linguistic community.  ;-)

while
ブドウ糖 would get the (P) tag.  The same would go for 菠
薐草/ほうれん草. To me, this seems to be the most logical
system, and the easiest way to remain consistent.

There is one important point you may have overlooked.  The case
of words which do _not_ have a (P) tag on any headword.

If I paraphrase your argument

"It's OK to have 菠薐草 as the first(primary) headword because
the (P) will be on the word actually used (ほうれん草)."

However if no (P) tags are involved you will obviously have
words where the primary headword includes some obscure kanji
which are rarely used in practise and there will be _no
indication_ that a later 'partial-kana' version is the standard
usage.

In any case I stand by my earlier position, which I may sum
up by saying that the primary headword should be one that
can be used without causing embarrassment. (I have to make
reluctant exception for (uk) tagged entries under the present
system).  In practical terms one order of magnitude difference
is no big deal, two or more is probably a bad idea.