[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

kanjidic



Hello all,
I wanted to introduce myself to the group and tell you a little bit about my project.
I started to create a Kanji study application and decided that the database generation component of my application might be better broken off into it's own application.

http://github.com/markburns/wwwjdic2db

I realise that this project seems to be the Python/Mercurial path, whereas my project is Ruby/Git but hopefully we can still share ideas.

I wanted to try and avoid any duplication of effort and possibly to make my project useful to the community. When I started this project there were no plans to make a database the primary source for kanjidic, so I was under the impression that my project would be a nice addition to the community.
However, I have been in touch with Jim Breen and it seems like the plan is to go with databases all the way.

Anyway, because Rails uses ActiveRecord it does mean that it's database agnostic, so it shouldn't be a big deal to generate a Postgresql database.

wwwjdic2db is probably around 95% complete and does the following

Downloads the full compressed kanjidic file and uncompresses it
Downloads deltas using rsync
Compiles text file into a database 
TODO:
Only import the deltas rather than regenerate the full database.

I have a suspicion that I didn't run the tests before my last commit to the project, so I think it's currently in a half-working state, but it really shouldn't be too long for me to clear this up.

Also if you wouldn't want to use multiple technologies and would want to replicate the same functionality of my project in Python then I might be interested in helping out with that. Although it would be a learning experience for me.
By the way, as for data structure I have the following tables (standard Rails naming conventions)

kanjis
onyomis
kunyomis
nanoris
onyomis_kanjis
kunyomis_kanjis
nanoris_kanjis
koreans
pinyins
koreans_kanjis
pinyins_kanjis
dictionaries (list of dictionaries indexed)
kanji_lookups (join between kanjis and dictionaries)

I'm thinking about renaming the various kanji lookup join tables to kanjis_onyomis, kanjis_kunyomis, etc.


I think that's it. I'd like to hear your thoughts on how this might be made of use to the project as a whole.

Are there plans to turn all the separate data sources into one large database?
So that's about it. I just wanted to introduce myself and get any thoughts.

Cheers,

Mark