[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] English n-gram counts
G'day,
Very fast search! For users less familiar with the corpus, it might be
good to show on the page the total number of ngrams (1-5) so that people
can calculate the relative frequency (and maybe mention that there is a
cut-off: low frequency ngrams will not appear).
On Thu, Nov 28, 2019 at 10:39 AM Jim Breen jimbreen@gmail.com
[edict-jmdict] <edict-jmdict@yahoogroups.com> wrote:
>
>
> Earlier today I was mentioning the Google English n-gram corpus
> in the context of finding the frequency of certain phrases. I realised
> that I'd implemented a system for searching that corpus years ago
> for my gairaigo segmenter at:
> http://nlp.cis.unimelb.edu.au/jwb/gairaigo.html
> but I'd never actually made it more generally available. Here it is:
>
> http://nlp.cis.unimelb.edu.au/jwb/engngrams.html
>
> Someone may find it useful. (FWIW the actual corpus is about 55Gb.)
>
> Jim
>
> --
> Jim Breen
> Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
> http://www.jimbreen.org/
> http://nihongo.monash.edu/
>
>
--
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University