[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Re: Example sentences 例文募集



I just had a thought about Google/Yahoo hits and the validity thereof. Call it "Reduction to Absurdity" if you will.
 
I wonder just how much frequency dictionaries of English would be modified by hits on "Lindsey Lohan" or "Hannah Montana". Maybe we should do a Google Hit count.
 
All things considered, though, the Edict project here does a pretty good job of at least trying to be valid and relevant. If you don't think so, imagine the poor soul in Japan who learns his English from the ALC/Eijiro site.
 
Now that is Reduction to Absurdity!

 
2008/6/18 Jim Breen <jimbreen@*********>:

2008/6/19 Francis Bond <bond@********>:


> However, it turns out that each individual speaker tends to live in a
> circumscribed world, so google can be used to supplement out
> intuitions. In the field of NLP, where I spend my time, 英日 is much,
> much more frequent that 和英, which is primarily used for human
> dictionaries.

That's my observation too.

We all know that written and spoken languages differ in various
respects. One of the interesting thing about the WWW as a corpus
and Google/Yahoo counts as indicators of relative word
frequencies is that the WWW tends to be more like the spoken
language than traditional text corpora such as collections of
newspaper articles, learned papers, etc. etc. All those blogs
with people pouring out their raw patois.

Cheers

Jim

--
Jim Breen
Honorary Senior Research Fellow
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
http://www.csse.monash.edu.au/~jwb/