[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] "noise" and "philosphy"...



Thanks, Jim, for some reassuring explanations. :-)

At 13:08 8/02/11 +1100, Jim Breen wrote:
>Hefei (place in China) seems to be written ホーフェイ in Japanese far more
>often than in its hanzi version (合肥).

And i just had to hit on it. :-)

>That "ヒト 【ひと】 human being " entry is from the Life Sciences file.

And i just had to hit on it. :-)
OK, as an aside, why would it be in there? Are there many katakana version of words in there that normal people write in kanji?

>>  This is something i wanted to ask you about anyway: what is the order in which part databases are queried?
>
>Actally a single file is used, containin EDICT, ENAMDICT and a heap of
>glossaries.
>The files that go into this file (glossdic) have a ranking tag, and where the
>same headword is found in more than one file, the lower ranked one(s)
>are dropped.

I see - and depending on the text, one may need one of the lower ranking items, but that gets to the issue of multiple outputs (more on that further down)...

>> And what are the criteria that a partial solution from one part database
>> is skipped in favor of a complete solution from another one.
>
>Exact matches should always be preferred over partial ones.
>
>For example, if you give it 清水安静 it should select 清水 + 安静, not get
>a partial match on the name 清水安三.

I see... so if the (for me) obvious kanji combination does not come up, it usually means i need to send it in... which i have been doing lately...

>> And is there no way to offer more than one result, from more than one
>> part database, for a given item that is being queried? I guess the
>> latter could make the code unwieldy...
>
>More to the point, it would make the output huge.

If that is the only problem, could an option for multiple output be offered?

>>  >I'm not sure that will give the best outcome for all possible users.
>>  >ENAMDICT has ~75,000 entries begining with katakana. Most of these are
>>  >the katakanaized names of people or places.
>>
>>  How many of those are katakanaized versions of Japanese or Chinese names
>> of people and places?
>
>Only a small proportion (almost none are katakanaized Japanese names.)

I am relieved... that day where i got two of them plus the ヒト must have been a statistical aberration...

At 08:38 8/02/13 +1100, Jim Breen wrote:
>OK. It's done. There are now checkbox options to exclude katakana and
>hiragana words/phrases in the Translate Words function. Also, whenever
>a "partial match" occurs in a katakana string, the rest of that string
>is skipped.

I missed the related discussion but am happy with the outcome. :-)
Any idea when those checkboxes will show on the TUFS interface?
(That is the server i usually use, since it the closest by...)
Does it require much extra work, meaning i should use the Monash server in case want to use those checkboxes?

Thanks & regards: Hendrik




--