[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [edict-jmdict] database schema
Ben Bullock wrote:
> Thanks for posting your example, it was educational, but it's a little
> hard to understand. I wonder if you could add some examples of how
> actual data fits into the database?
Yes, good idea. I will write and post up a short example today.
> Also I wonder if it won't waste storage to use very large fixed length
> records? The "kanji" and "kana" tables each have 2048 varchar spaces.
> I recently made a similar database for a more modest project for
> kanjidic, and I used the "tinytext" for the kanji meanings.
I think varchar(X) types use roughly the space required by the
actual strings stored, not the declared length. But yes, the declared
lengths are probably excessive. I wanted to make sure they
were long enough to hold anything that might turn up in any of
the *dict files or the examples file.
In jmdict it turns out that all the kanji and reading entries are
very short but there is a 400+ character gloss (seq=1507130).
In the examples file the longest japanese and english 'A' texts
are 127 and 253 characters respectively.