[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] JMdict internationalization effort - let's (finally) do it!



Hi Jim,

> I have looked at the Transifex site, and I must say its usage doesn't
> seem very intuitive. Also I see multi-sense JMdict entries display as
> multiple entries, e.g. お使い. I hope that's not a problem.

On the contrary I found it very efficient to translate stuff easily
(everything can be controlled with the keyboard). What are your griefs
exactly?

About the multi-sense entries, this is the intended behavior. The idea
is that every definition comes with one gloss per line, and every
sense has its own entry. This is to avoid having several senses
grouped into one, and also to limit the semantic scope of every sense
to what it is supposed to be (as stated in
http://www.tagaini.net/jmdict-i18n )

> It would be good to see the JMdict entries with their French glosses.
> At present it all seems to be broken up by JLPT levels, which I find
> difficult to navigate.

For practical reasons we need to split the huge dictionaries into more
human-sized modules, to which we can give priority for translating.
The advantages are:
- people know what should be translated first (more common entries,
e.g. JLPT 5 and up)
- once the most important modules are translated, regressions are easy
to track since their translation percentage will drop from 100% to 99%
or so - translators can then act quickly upon this
- not-so-common but themed modules can be adopted by persons who are
expert in the field.

The classification by JLPT is totally arbitrary, and I can change it
by something else if a better idea is proposed. IMHO it has the
advantage of making material used by students translated first.

> Let's get the French part working. Some of the other languages can
> probably be folded in later as they are not part of ongoing projects. Others,
> such as Japanese-German, are active projects, and it will be a matter
> of working out a way of enhancing the data input. I have discussed with
> Ulrich the possibility of putting JMdict sequence numbers into the Wadoku
> database, for example.

That would be nice - however you would not have a per-sense matching
and thus the translation will remain approximate, no matter how good
the Wadoku is by itself.

> Coming back to kanjidic2, as you know the initial French translations
> were done by Alain Thierion, and Alain is pressing on with the
> translations, taking them beyond just the 常用漢字. Also he is not
> just translating the English meanings; he is going to other sources
> including several 漢和字典. He and I are collaborating on this and I have
> been correcting the meanings in kanjidic when he finds errors. I don't
> know the state of the kanjidic2 French translations in the Transifex system,
> but for now I think I want to stay with Alain's translations, as I
> have confidence
> in what he is producing.

There has been some contributions in the French kanjidic2, and
although they could not compete with Alain's work in terms of quality
I'd like to see them used when possible - maybe you can merge the
jmdict-i18n translations first, then overwrite them with Alain's -
that way, only entries that Alain did not translate himself would
remain until he proposes a better translation (AFAIK people did not
modify any of the existing translations anyway, so the jmdict-i18n
contributed ones would be about yet-uncovered kanji).

Actually I contacted Alain when I launched the effort and proposed him
to work directly in Transifex to continue his work - he said he wanted
to see where this was going first. Now that we are heading to a
direct, daily (?) integration of jmdict-i18n into JMdict, maybe he
could tell us his thoughts if he is reading us. Simplifying the
workflow would be good for everybody (and especially Jim), and since
the JMdict is a collaborative project it makes probably more sense to
work as a community that alone and submit batches of new translations
once in a while. Not to mention that as a linguist Alain could also
give guidance to other people, if he wishes so.

So, if I got everything correctly, the status with respect to
jmdict-i18n will be:
- French JMdict translations are going to be exclusively taken from
jmdict-i18n, since it is the only moving and active source.
- Other JMdict languages will continue to be merged as they are today for now
- (subject to Jim's approval) kanjidic2 entries will also be merged,
then overriden by Alain's translations whenever relevant
- Translations will be made available in the format described by Jim,
on a public server, and will be updated regularly

Could you confirm my understand is correct? Then I will adapt my
scripts and start releasing data for you to integrate. Hopefully this
can be hacked in one weekend or two.

I would also like to remind you that one person is currently doing an
amazing work at translating kanjidic2 entries in Italian. He is about
to finish all jouyou kanji, and the translation quality seems to be
quite good. Could you also consider his work for integration?

Thanks,
Alex.