[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Changes in Tanaka corpus format?



2008/1/28, Paul Blay <blay.paul@googlemail.com>:

Thanks for the quick reply.

> > I just noticed that the A-lines in the Tanaka corpus now ends with #ID=nn
>
> Yes.  You can mostly blame (or thank) me for that idea.
/..snip../
> The trigger for having an ID in the first place was to help the
> maintainer of a multi-lingual example sentence site to keep his
> content synchronised with the current version of the Tanaka Corpus.
> (c.f. Tatoeba Project).

Great. I wanted to know if it was permanent, so I can change my code
to handle the new format, or if it had been just a temporary thing,
wait it out.

I think it's a good idea to include ID's for the sentences and I'll
probably make use of them myself in the future. Although it would be
appreciated if format changes were announced a few days in advance
here on the mailing list :)

Best regards,
Kim Ahlström
Jisho.org