More information

From EDRDG Wiki

The current handling of multiple languages has applied since June 2017.The following is some information provided by Jim Breen about it.

Previously all the non-English translations, with the single exception of the French ones, were included at the end of the first English sense/translation. In the case of the French ones, they had been aligned quite some years ago, so they went into the indicated senses, however the alignment has been broken in a number of cases.

That approach had some problems, as it took no account of the fact that about 10% of entries have multiple senses. It made no sense to have all the German translations of 掛ける tipped into the first of the 24 English senses.

Actually aligning the senses is a huge problem as the sources have been compiled quite independently. 掛ける has 15 senses in the German dictionary, for example. Also the non-English translations were being treated as a single sense each, although some have quite extensive sense-tagging in the form of "(1) ...... (2) ......", etc.

After quite a bit of discussion with people using the data, it was decided to:

  • as far as possible break the non-English translations into senses according to internal marking such as "(n)" or "n)";
  • add the non-English translations as separate senses after the English ones.
  • for the future think about ways in which some form of sense-alignment could be done for the ~10% of polysemous entries.

We recognized that this is not ideal, but it's probably better than previously when there was no sense splitting for the non-English translations, and they were all being treated as though they were related to the first sense.

I did consider the possibility of handling the single-sense cases a bit differently, e.g. putting them all in together, but there were cases where the base English file had one sense but the Japanese-Dutch one had three.

So proper sense alignment is a problem for future consideration.