Jim's WWW-Based Tools

This is a small collection of WWW-based tools which I find handy.

Google n-gram counts. This returns the frequency counts from the Google Japanese n-grams (2007) for one or more terms. Much more reliable than the counts from a WWW search. (Melbourne University site) (Alternative site)

English N-Gram Counts This server looks up sequences of one to five English words in the Google English N-gram Corpus and returns the count of occurences of the sequence. (Alternative site)

Kyoto/melbourne n-gram counts. This returns the frequency counts from the Kyoto/Melbourne Japanese n-grams (2004). Much more reliable than the counts from a WWW search. (Melbourne University site) (Alternative site)

MeCab/Unidic segmentation. Segmentation of a Japanese sentence by MeCab using the Unidic morpheme lexicon.

ChaSen/IPADIC segmentation. Segmentation of a Japanese sentence by ChaSen using the IPADIC morpheme lexicon. (This has really been replaced by MeCab, but is interesting for comparison purposes.)

Tatoeba/Tanaka Sentence Indexer. This produces draft indices for Japanese sentences in the Tatoeba project. It speeds up the process of creating/amending indices. (Indices are edited in Tatoeba using pages such as this one, but you are supposed to be a "corpus maintainer" to do that.)

Gairaigo Segmenter/Translator My system for finding the "correct" segmentation of long gairaigo, along with possible translations.

Yojijukugo dictionary

Reverso term-sentence tool.

Ichacha site lookups

IT Words collection

HonyakuStar lookups

The Tofugu counters page.

The Australia-Japan Research Project at the Australian War Memorial. (sample vocab. page)

Bulk Entry page for adding lots of entries to JMdict. It enables EDICT-format entries to be passed fairly easily to the New Entry form.

Japanese Wordnet entries which are not in JMdict This list has been filtered against JMdict and has links to both JWN and the big "combined" file in WWWJDIC. Can be handy for testing/generating new entries.

While I am at it, I'll include links to a couple of maintenance pages.

Possible extra surface forms. This is a set of alternative surface forms for about 15k JMdict entries, as indicated in the Unidic lexicon.

Weekly Tatoeba Index Error Report Report on the sentences in Tatoeba where the Japanese indices no longer match the sentence itself.

Jim
July 2013/August 2015/July 2024