On Sep 23, 2007, at 8:05 PM, Jim Breen wrote: On 24/09/2007, Jim Rose <jim@**********com> wrote: > On Sep 23, 2007, at 10:32 AM, Paul Blay wrote: > > Well the problem is review and validation. Are all 150,000 Tanaka > > Corpus sentences going to be "pending review" ? I assumed they > > would go in without being checked just because there are so many of > > them. > > I've been wondering how "legal" it would be to build a corpus of > sentences extracted from Japanese text books. From copyright.gov: > > "Under the fair use doctrine of the U.S. copyright statute, it is > permissible to use limited portions of a work including quotes, for > purposes such as commentary, criticism, news reporting, and scholarly > reports. There are no legal rules permitting the use of a specific > number of words, a certain number of musical notes, or percentage of > a work. Whether a particular use qualifies as fair use depends on all <
PAN class="Apple-style-span" style="font-family: Georgia; font-size: 13px; line-height: 15px; ">> the circumstances."
Since the Tanaka corpus was compiled in Japan, is currently being maintained in the UK and is distributed from Australia, the minutiae of US copyright law is a bit peripheral.
I think its safe to say that most of the texts I would conceivable extract sentence pairs from, should I pursue this project, would be copyrighted in the US (Tuttle, U. of Hawaii, the remains of Weatherhill) and therefore the minutiae of US copyright law, as you say, would be of direct bearing - no? After-all, I'm living in a US territory, so any sentences I personally extract must be fair use... since by the act of living here, I am compelled to adhere to the laws here.
An issue that needs to be considered when taking a sentence pair from another work is whether the textbook/whatever actually has copyright over it. From what I have seen identical sentences crop up all over the place. It is pretty plain that compilers of textbooks and dictionaries are pretty free with their borrowings.
Good point.
That said, when I suggest a sentence based on material from another source for addition to the Tanaka corpus, I always modify it. Usually I change the Japanese a little, e.g. using a different noun h
re or there, and often I retranslate part or all of it (which in the case of Eijiro-based sentences is often necessary to have a grammatical English sentence.)
There's safety in variation drills.
[...]
> And of course, our government gives great advice on that page when it > says: > > "The safest course is always to get permission from the copyright > owner before using copyrighted material. The Copyright Office cannot > give this permission." > > Any thoughts?
I think the Tanaka collection is pretty safe from a copyright point of view, due to its massive dilution. I wouldn't go along with adding copyrighted sentences with citations - I say stick to stuff that can safely be placed in the Public Domain. If that means modifying material, do it.
OK.
|