[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] English n-gram counts



G'day,

Very fast search!  For users less familiar with the corpus, it might be
good to show on the page the total number of ngrams (1-5) so that people
can calculate the relative frequency (and maybe mention that there is a
cut-off:  low frequency ngrams will not appear).

On Thu, Nov 28, 2019 at 10:39 AM Jim Breen jimbreen@gmail.com
[edict-jmdict] <edict-jmdict@yahoogroups.com> wrote:

>
>
> Earlier today I was mentioning the Google English n-gram corpus
> in the context of finding the frequency of certain phrases. I realised
> that I'd implemented a system for searching that corpus years ago
> for my gairaigo segmenter at:
> http://nlp.cis.unimelb.edu.au/jwb/gairaigo.html
> but I'd never actually made it more generally available. Here it is:
>
> http://nlp.cis.unimelb.edu.au/jwb/engngrams.html
>
> Someone may find it useful. (FWIW the actual corpus is about 55Gb.)
>
> Jim
>
> --
> Jim Breen
> Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
> http://www.jimbreen.org/
> http://nihongo.monash.edu/
> 
>


-- 
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University