[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] aruyouni

To: edict-jmdict@***************
Subject: Re: [edict-jmdict] aruyouni
From: René Malenfant <rene_malenfant@***********>
Date: Mon, 18 Feb 2008 21:49:04 +0900

I don't know what ちゃせん is, but I'm rather sure that Google hassome kind of corpus that it uses to parse input strings intoconstituent words.

Of course I could be dead wrong about this. This is what Jim andPaul specialize in.



Rene


On 18-Feb-08, at 9:44 PM, Jim Rose wrote:

What your suggesting then is that an unquoted string is run throughan analysis such as ChaSen before being run through a database?Otherwise how would Google know to parse that string into three words.






On Feb 18, 2008, at 8:29 AM, René Malenfant wrote:

Well, that comment was directed at Paul, but I just tried what you
suggested and I got the same 140,000,000 result as you.

The [G] link does not appear to be putting the search string in
quotation marks. i.e., it looks for あるように, not "あるよ
うに", so it drastically overestimates the number of hits. (AFAICT,
it searches for any page that has ある、よう and に, but not
necessarily as "あるように". With three such common words, it's
basically returning every Japanese page on the web.)

If there's no technical difficulty preventing it, perhaps the [G]

links should be changed to use quotation marks in their searchstrings?


Rene

Follow-Ups:
- Re: [edict-jmdict] aruyouni
  - From: "Paul Blay" <blay.paul@**************>

References:
- aruyouni
  - From: Jim Rose <jim@*************>
- Re: [edict-jmdict] aruyouni
  - From: René Malenfant <rene_malenfant@***********>
- Re: [edict-jmdict] aruyouni
  - From: "Paul Blay" <blay.paul@**************>
- Re: [edict-jmdict] aruyouni
  - From: René Malenfant <rene_malenfant@***********>
- Re: [edict-jmdict] aruyouni
  - From: Jim Rose <jim@*************>
- Re: [edict-jmdict] aruyouni
  - From: René Malenfant <rene_malenfant@***********>
- Re: [edict-jmdict] aruyouni
  - From: Jim Rose <jim@*************>

Prev by Date: Re: [edict-jmdict] aruyouni
Next by Date: Re: [edict-jmdict] aruyouni
Previous by thread: Re: [edict-jmdict] aruyouni
Next by thread: Re: [edict-jmdict] aruyouni
Index(es):
- Date
- Thread