[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] JMdict/EDICT update backlog




On Sep 23, 2007, at 8:05 PM, Jim Breen wrote:

On 24/09/2007, Jim Rose <jim@**********com> wrote:
> On Sep 23, 2007, at 10:32 AM, Paul Blay wrote:
> > Well the problem is review and validation. Are all 150,000 Tanaka
> > Corpus sentences going to be "pending review" ? I assumed they
> > would go in without being checked just because there are so many of
> > them.
>
> I've been wondering how "legal" it would be to build a corpus of
> sentences extracted from Japanese text books. From copyright.gov:
>
> "Under the fair use doctrine of the U.S. copyright statute, it is
> permissible to use limited portions of a work including quotes, for
> purposes such as commentary, criticism, news reporting, and scholarly
> reports. There are no legal rules permitting the use of a specific
> number of words, a certain number of musical notes, or percentage of
> a work. Whether a particular use qualifies as fair use depends on all
< PAN class="Apple-style-span" style="font-family: Georgia; font-size: 13px; line-height: 15px; ">> the circumstances.
"

Since the Tanaka corpus was compiled in Japan, is currently being maintained
in the UK and is distributed from Australia, the minutiae of US copyright
law is a bit peripheral.


I think its safe to say that most of the texts I would conceivable extract sentence pairs from, should I pursue this project, would be copyrighted in the US (Tuttle, U. of Hawaii, the remains of Weatherhill) and therefore the minutiae of US copyright law, as you say, would be of direct bearing - no?  After-all, I'm living in a US territory, so any sentences I personally extract must be fair use... since by the act of living here, I am compelled to adhere to the laws here.



An issue that needs to be considered when taking a sentence pair from another
work is whether the textbook/whatever actually has copyright over it. From
what I have seen identical sentences crop up all over the place. It is pretty
plain that compilers of textbooks and dictionaries are pretty free with their
borrowings.


Good point.


That said, when I suggest a sentence based on material from another
source for addition to the Tanaka corpus, I always modify it. Usually I change
the Japanese a little, e.g. using a different noun h re or there, and often I
retranslate part or all of it (which in the case of Eijiro-based sentences is
often necessary to have a grammatical English sentence.)


There's safety in variation drills.


[...]

> And of course, our government gives great advice on that page when it
> says:
>
> "The safest course is always to get permission from the copyright
> owner before using copyrighted material. The Copyright Office cannot
> give this permission."
>
> Any thoughts?

I think the Tanaka collection is pretty safe from a copyright point of
view, due to its massive dilution. I wouldn't go along with adding copyrighted
sentences with citations - I say stick to stuff that can safely be
placed in the
Public Domain. If that means modifying material, do it.


OK.