[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Google search of JMdict



I sent this around last month.

On 5 September 2012 14:32, Jim Breen <jimbreen@gmail.com> wrote:
> A correspondent asked me recently about providing more flexible and
> powerful ways of searching the glosses of JMdict entries. It occurred to
> me that it might be interesting to point Google's "Custom Search" at the
> file. I've done that; splitting the dictionary into 160,000+ WWW pages. It's
> only about 20% indexed - it will probably take weeks to complete, but
> there's enough done to play with.
>
> The search page is at http://www.edrdg.org/wwwjgoogle/
>
> Once it's fully indexed, and if it seems useful, I'll make it more widely known.
> I'll also arrange for it to be regularly updated. It only takes a couple
> of minutes to generate.

Unfortunately the indexing of the site has stalled with about 25% of pages
indexed. The indexer seems to visit every few days, but seems just to reindex
the same pages. I guess it isn't really expecting to be hit with  160,000 pages.

I could probably get them indexed by switching from the free service to a
commercial one, but for 160k pages that's very expensive.

I've been thinking about this, and really it should be possible to do all this
in-house. All the words, both English and Japanese, are already indexed
daily in the version used by WWWJDIC. It probably wouldn't be too hard
to put together a usable multi-key search of the "cat AND dog", "apple
NOT orange" variety". I don't know when I'd have the time, but I'll mull it
over.

Cheers

Jim

-- 
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University