[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [edict-jmdict] Re: Language codes (was: a few more jmdict errors)



Jim Breen wrote:
On 09/04/2008, Stuart McGraw <smcg4191@frii.com> wrote:
> Jim Breen wrote (and Jean-Luc Leger wrote similarly):
>  > >  The Algonquin language uses the code "alq", not "alg". :-)
>  >
>  > Not according to
http://www.loc.gov/standards/iso639-2/php/code_list.php
>  > which is the "home" of that standard.
>
>  Ah, I see what happened.  You were using ISO-639-2 (1998), I was
>  using ISO-639-3 (2007) http://www.sil.org/iso639-3/default.asp,
>  (downloadable code table at http://www.sil.org/iso639-3/download.asp)
>  which defines Algonquin as "alq".  .
>
>  It is surprising to me that an already established code would be
>  changed but that seems to be the case.

It's one of several where 639-2 and 639-3 differ. ara->arb is another.
It's
a result of a different treatment of language families, which is partly
why there are two standards.

I looked at the Wikipedia article on ISO-629 and a little at some of
the docs at the SIL site but my eyes started to glaze over pretty fast.
Is the intention that there be two different standards or that -3 is
a successor to -2?  (Of course intent and reality are sometimes
different.)

>  Any reason for preferring -2 to -3?  The latter has quite a few
>  more languages and you can never tell when you might want to include
>  Zeem language glosses in JMdict. :-)  It downloadable table also
>  seems to have more information (e.g. language type and scope
columns),
>  though that's not directly relevant to jmdict use.

The reason I preferred 639-2 was that it has B codes, whereas
639-3 only has T codes. For most users of EDICT/JMdict I think
chi, ger and tib are likely to be more useful than zho, deu and bod.

Excerpted from from the ISO639-3 code table at
http://www.sil.org/iso639-3/download.asp
Id      Part2B  Part2T  Part1   Scope   Type    Ref_Name
zho     chi     zho     zh      M       L       Chinese
deu     ger     deu     de      I       L       German
bod     tib     bod     bo      I       L       Tibetan

I don't know if the "part2B" column is technically part of the
standard or supplementary information but as it is distributed
officially with it, it seems like rather a fine point.  (I did
not however look at all the entries to see if all the B codes
were the same in -3 as in -2).