[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] Indicating 高低アクセント
2008/7/15 Francis Bond <bond@ieee.org>:
> I think that just copying over the accents from 大辞林 without permission
> would definitely be a breach of copyright. I know that when Amano-san
> wanted similar data for the 語彙特性 he retested all the accents from
> scratch, and it was a lot of work.
The actual information about accents in Japanese is, of course, not
copyrightable. No-one can claim to have created the information that
牡蠣 is HL, 垣 is LH(L) and 柿 is LH(H). Maybe if one entity only had
collected the information, there might be some claim, but with many
sources including the NHK日本語発音アクセント辞典 and the
新明解日本語アクセント辞典 having it, no-one can claim to have
the sole source. In this respect, the pitch accents are in a similar
category to 猫 = cat or 犬 = dog.
This really raises the question about what is copyrightable about
dictionaries, since they really contain information which is very
much public knowledge. This is something that is always discussed
in manuals and guides to lexicography, because it is far from
black and white, and also varies a bit from jurisdiction to
jurisdiction (at present JMdict/EDICT would come under Australian
copyright law, but if the editing moved to an online system on
arakawa it would probably move into the US jurisdiction.)
Most authorities say that what is copyrightable about dictionaries
is:
(a) the collection itself. You can't make off with an entire
dictionary and call it "Fred's Koujien". (Not as silly as it
sounds - in the 18C that sort of thing happened a couple of
times.)
(b) the organization, i.e. the work you put into the
layout, look, feel, etc.
(c) the wording of entries. As this is part of the creative
content it is an important point, and I try to follow Sidney
Landau's advice (Dictionaries: The Art and Craft of Lexicography),
which is to consult several sources, and create an entry as much
as possible in my own words. In cases like 猫 = cat there are
limits to how far you can do this (some people don't even try too hard
- as Tom Gally has pointed out, many 国語辞典 have virtually identical
glosses, often for quite complex things.) I extend this to other
things as well. The 語彙特性 Francis mentioned has a wonderfully
detailed classification system for words, into which people at
NTT put a lot of work. Much as I would like to have those codes
in JMdict, I regard them to be NTT's creation, and wouldn't
touch them without NTT's permission.
Is it OK to consult other dictionaries when compiling your own?
Of course. Everyone does it. Andrew Nelson wrote in his Forward:
"I am indebted to the following works for accurate lexicographic
information .." and went on to list 38 dictionaries and word lists.
That's a little more subtle than James Murray (the early editor
of the OED), who said that "lexicographers steal shamelessly from
each other", but the message is much the same. No-one compiles
dictionaries just by looking at "free" sources.
The question then is whether we are free to use published
dictionaries such as 大辞林, the NHK日本語発音アクセント辞典,
etc. to source accent information? In my view we can do this,
particularly if the notation that ended up being inserted in
JMdict/EDICT was different.
Can they simply be extracted from 大辞林, or do we have to
cross-check several sources and reach a conclusion about
each one?
- in the US where the Feist ruling applies, the answer
would be clearly "yes", as JMdict's and 大辞林's collections of
entries are different, with different glosses and layouts,
and all that would be taken across is a relatively small number
of facts.
- in Japan? Possibly - things are regarded as much tighter
there, and Amano-san's efforts with the 語彙特性 (subsequently
published by Sanseido!) were understandably cautious. OTOH I
am aware of a company in Japan that sells word lists - huge ones
covering everything under the sun. Where do they come from? -
published sources including dictionaries. Have they been sued? -
no, although the publishers are quite aware of it. Since the
published lists are not dictionaries, don't contain glosses, are
in a very different format from the published sources, actual breach
of copyright would be very hard to sustain, and the company in
question (and some of you will know of it) has got legal opinions
clearing its practices.
- in Australia? Quite possibly. In a case a few years back
(http://www.findlaw.com.au/article/6404.htm)
an attempt to republish the telephone book was struck down, establishing
a clear difference from the Feist case in the US. Looking at the reasons
for that judgement one sees that a key was the "material reproduction
of the directories and headings", and as one judge found:
"Copyright will be infringed only where the alleged infringer takes a
substantial part of the copyright work". Earlier cases where the phone
book was used to generate mailing lists (just names and addresses; no phone
numbers) were not found to be copyright infringements, as they didn't
produce telephone directories.
So do I think there is a significant risk in taking a one-digit code
representing a fact from approximately 20% of entries where 大辞林 and
JMdict intersect? No I don't, just as I thought there was no risk when I
compared JMdict's nouns with those 大辞林 and looked for a スル tag
when fleshing out the "vs" tags. Both are cases of factual information
making up a very small proportion of both works.
Anyway, I'm sure we will have more debate on this.
BTW, as well as having read a lot on dictionary copyright, I have
discussed it several times in the past with lawyers. I have never
sought an actual opinion as you need to have a specific case and a
lot of money. Actual legal copyright cases involving dictionaries are
practically impossible to find.
Cheers
Jim
--
Jim Breen
Honorary Senior Research Fellow
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
http://www.csse.monash.edu.au/~jwb/