[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] Database project, JMDict Structure
> <http://www.edrdg.org/%7Emcg/>http://www.edrdg.org/~mcg/
<http://www.edrdg.org/%7Emcg/>
http://www.edrdg.org/~smg/ <http://www.edrdg.org/%7Emcg/>
http://akubimarco.free.fr/Jmdict_struct.pdf
You can find the structure of the databse I use for JMdict.
-The multilingual part, composed of the tables :
LANGUAGES,
ANNOTATION_TYPES, ANNOTATIONS, ANNOTATION_TYPE_LIST,
CONTENTS, CONTENT_LIST
EXAMPLES_CONTENT,
ENTRY_RELATION_TYPES, ENTRY_RELATION_LIST
COMMENT_TYPES, COMMENT_TYPE_LIST, COMMENTS
-The association part
ANNOTATION_TYPES_ASSOC map annotation to their respective annotation types
RW_ANNOTATIONS specify annotation for specific reading/writing
ENTRY_CONTENT glosses for the entry
ENTRY_ENTRY_RELATION, ENTRY_ENTRY_SEQUENCE_RELATION The first table is
empty, but should replace the second one... It modelise the xref, ant
elements. I believe that those element should point to a sense
information and not a reading/writing.
By retrieving the main reading/writing of jmdict entry, I can at least
fill the ENTRY_ENTRY_SEQUENCE_RELATION, giving a relation between a
sense and a JMdictSeq... which is not what i want but is all i can do
with the data available.
COMMENT_ENTRY_ASSOC associate comments to the different entry (only the
s_inf, sense information nowadays)
-The reading/writing part
By analyzing the stackr, re_rest and so on, I build three tables
READING_WRITING : all possible reading and writing for each JMDICT Sequence
ENTRY_RW_ASSOCIATION: Associate reading and writing to the sense level,
pointing the main ones
ALTERNATIVE_RW: establish the link between the different reading, about
their alternate reading./writing
The main differences with the proposed database are :
-I added a multilingual function. I mean everything can be in different
language including the metadata about the information in the databases
(Annotation, examples, relation type between entry description etc...)
-Concept of COMMENTS and ANNOTATIONS
I replaced all element field, misc, pos etc... by annotations.
I replaced s_inf by comments.
Annotations are static information about an entry
Comments are text informations.
Annotations is a generalization of the field, misx, pos... fields.
Nowadays I have 7 types of annotations : field, part of speech,
grammatical, gender, frequency (new1, news2...), reading and writing
(ok, oK, uk...), dialect.
The good point is that it is simpler to add a new type and value for
comments or annotations
-Since every precise information is at a sense level, I use internal ID
other than JMDic sequence, and informations are related to those ideas.
(readings, relations between entries...). So what I call entry is a sense.
-I dont share the reading between entries even if they are the same,
because there can be comments on the reading itself (re_inf)
-The structure I use for reading and writing is different from JMDict,
but I did it by keeping in mind to keep all informations present in JMDict.
Another comment about the current proposed database, is that I dont
understand why the entry Id is repeated in all tables? For example in
rinf, rdgn. Theres a link between ecah entr and rdgn in rdgn table, so
what is the point to repeat entr in rinf table?
I m sorry if my english is a litlle bit weird... I hope my explanations
are clear enough to enable a discussion.
Best regards.