[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] "noise" and "philosphy"...
Thanks, Jim, for some reassuring explanations. :-)
At 13:08 8/02/11 +1100, Jim Breen wrote:
>Hefei (place in China) seems to be written ホーフェイ in Japanese far more
>often than in its hanzi version (合肥).
And i just had to hit on it. :-)
>That "ヒト 【ひと】 human being " entry is from the Life Sciences file.
And i just had to hit on it. :-)
OK, as an aside, why would it be in there? Are there many katakana version of words in there that normal people write in kanji?
>> This is something i wanted to ask you about anyway: what is the order in which part databases are queried?
>
>Actally a single file is used, containin EDICT, ENAMDICT and a heap of
>glossaries.
>The files that go into this file (glossdic) have a ranking tag, and where the
>same headword is found in more than one file, the lower ranked one(s)
>are dropped.
I see - and depending on the text, one may need one of the lower ranking items, but that gets to the issue of multiple outputs (more on that further down)...
>> And what are the criteria that a partial solution from one part database
>> is skipped in favor of a complete solution from another one.
>
>Exact matches should always be preferred over partial ones.
>
>For example, if you give it 清水安静 it should select 清水 + 安静, not get
>a partial match on the name 清水安三.
I see... so if the (for me) obvious kanji combination does not come up, it usually means i need to send it in... which i have been doing lately...
>> And is there no way to offer more than one result, from more than one
>> part database, for a given item that is being queried? I guess the
>> latter could make the code unwieldy...
>
>More to the point, it would make the output huge.
If that is the only problem, could an option for multiple output be offered?
>> >I'm not sure that will give the best outcome for all possible users.
>> >ENAMDICT has ~75,000 entries begining with katakana. Most of these are
>> >the katakanaized names of people or places.
>>
>> How many of those are katakanaized versions of Japanese or Chinese names
>> of people and places?
>
>Only a small proportion (almost none are katakanaized Japanese names.)
I am relieved... that day where i got two of them plus the ヒト must have been a statistical aberration...
At 08:38 8/02/13 +1100, Jim Breen wrote:
>OK. It's done. There are now checkbox options to exclude katakana and
>hiragana words/phrases in the Translate Words function. Also, whenever
>a "partial match" occurs in a katakana string, the rest of that string
>is skipped.
I missed the related discussion but am happy with the outcome. :-)
Any idea when those checkboxes will show on the TUFS interface?
(That is the server i usually use, since it the closest by...)
Does it require much extra work, meaning i should use the Monash server in case want to use those checkboxes?
Thanks & regards: Hendrik
--