[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [edict-jmdict] database schema



Ben Bullock wrote:
> Thanks for posting your example, it was educational, but it's a little
> hard to understand. I wonder if you could add some examples of how
> actual data fits into the database?

Yes, good idea.  I will write and post up a short example today.

> Also I wonder if it won't waste storage to use very large fixed length
> records? The "kanji" and "kana" tables each have 2048 varchar spaces.
> I recently made a similar database for a more modest project for
> kanjidic, and I used the "tinytext" for the kanji meanings.

I think varchar(X) types use roughly the space required by the 
actual strings stored, not the declared length.  But yes, the declared
lengths are probably excessive.  I wanted to make sure they
were long enough to hold anything that might turn up in any of
the *dict files or the examples file.

In jmdict it turns out that all the kanji and reading entries are 
very short but there is a 400+ character gloss (seq=1507130).
In the examples file the longest japanese and english 'A' texts 
are 127 and 253 characters respectively.