[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] JMdict internationalization effort - let's (finally) do it!



On 20 January 2012 12:21, Alexandre Courbot <gnurou@gmail.com> wrote:

>> I have looked at the Transifex site, and I must say its usage doesn't
>> seem very intuitive. Also I see multi-sense JMdict entries display as
>> multiple entries, e.g. お使い. I hope that's not a problem.
>
> On the contrary I found it very efficient to translate stuff easily
> (everything can be controlled with the keyboard). What are your griefs
> exactly?

I eventually found my way to where one translates. The Taigaini page
helped.

>> Let's get the French part working. Some of the other languages can
>> probably be folded in later as they are not part of ongoing projects. Others,
>> such as Japanese-German, are active projects, and it will be a matter
>> of working out a way of enhancing the data input. I have discussed with
>> Ulrich the possibility of putting JMdict sequence numbers into the Wadoku
>> database, for example.
>
> That would be nice - however you would not have a per-sense matching
> and thus the translation will remain approximate, no matter how good
> the Wadoku is by itself.

Per-sense or per-gloss? It would be nice to get the senses aligned eventually.
In fact it would be a very nice NLP project to attempt such an alignment
automatically.

>> Coming back to kanjidic2, as you know the initial French translations
>> were done by Alain Thierion, and Alain is pressing on with the
>> translations, taking them beyond just the 常用漢字. Also he is not
>> just translating the English meanings; he is going to other sources
>> including several 漢和字典. He and I are collaborating on this and I have
>> been correcting the meanings in kanjidic when he finds errors. I don't
>> know the state of the kanjidic2 French translations in the Transifex system,
>> but for now I think I want to stay with Alain's translations, as I
>> have confidence
>> in what he is producing.
>
> There has been some contributions in the French kanjidic2, and
> although they could not compete with Alain's work in terms of quality
> I'd like to see them used when possible - maybe you can merge the
> jmdict-i18n translations first, then overwrite them with Alain's -
> that way, only entries that Alain did not translate himself would
> remain until he proposes a better translation (AFAIK people did not
> modify any of the existing translations anyway, so the jmdict-i18n
> contributed ones would be about yet-uncovered kanji).

What I'll do is use Alain's translations as the primary source, and the
others where there is no other available.

> So, if I got everything correctly, the status with respect to
> jmdict-i18n will be:
> - French JMdict translations are going to be exclusively taken from
> jmdict-i18n, since it is the only moving and active source.

Yes.

> - Other JMdict languages will continue to be merged as they are today for now

For now, yes.

> - (subject to Jim's approval) kanjidic2 entries will also be merged,
> then overriden by Alain's translations whenever relevant

The actual mechanism will be a bit different, but the result will be much the
same.

> - Translations will be made available in the format described by Jim,
> on a public server, and will be updated regularly

That would be good.

> Could you confirm my understand is correct? Then I will adapt my
> scripts and start releasing data for you to integrate. Hopefully this
> can be hacked in one weekend or two.

Some sample ones for JMdict would be good. I assume you have all the
ones from Jean-Marc's dico in there already.

> I would also like to remind you that one person is currently doing an
> amazing work at translating kanjidic2 entries in Italian. He is about
> to finish all jouyou kanji, and the translation quality seems to be
> quite good. Could you also consider his work for integration?

Absolutely. It would be very valuable.

Cheers

Jim

-- 
Jim Breen
Adjunct Snr Research Fellow, Clayton School of IT, Monash University
Webmaster: Hawthorn Rowing Club, Treasurer: Japanese Studies Centre
Graduate student: Language Technology Group, University of Melbourne