More information

From EDRDG Wiki
Jump to navigation Jump to search

The current handling of multiple languages has applied since June 2017.The following is some information provided by Jim Breen about it.

Previously all the non-English translations, with the single exception of the French ones, were included at the end of the first English sense/translation. In the case of the French ones, they had been aligned quite some years ago, so they went into the indicated senses, however the alignment has been broken in a number of cases.

That approach had some problems, as it took no account of the fact that about 10% of entries have multiple senses. It made no sense to have all the German translations of 掛ける tipped into the first of the 24 English senses.

Actually aligning the senses is a huge problem as the sources have been compiled quite independently. 掛ける has 15 senses in the German dictionary, for example. Also the non-English translations were being treated as a single sense each, although some have quite extensive sense-tagging in the form of "(1) ...... (2) ......", etc.

After quite a bit of discussion with people using the data, it was decided to:

  • as far as possible break the non-English translations into senses according to internal marking such as "(n)" or "n)";
  • add the non-English translations as separate senses after the English ones.
  • for the future think about ways in which some form of sense-alignment could be done for the ~10% of polysemous entries.

We recognized that this is not ideal, but it's probably better than previously when there was no sense splitting for the non-English translations, and they were all being treated as though they were related to the first sense.

I did consider the possibility of handling the single-sense cases a bit differently, e.g. putting them all in together, but there were cases where the base English file had one sense but the Japanese-Dutch one had three.

So proper sense alignment is a problem for future consideration.