[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] JMdict download format
On 09/24/2015 04:36 PM, Jim Breen jimbreen@gmail.com [edict-jmdict] wrote:
> On 25 September 2015 at 04:25, Stuart McGraw smcg6347@outlook.com [edict-jmdict] wrote:
>[...]
> I shudder to think what a CSV would look like, and I really have
> to wonder how it would be an improvement over the XML.
The main improvement is that csv (or sql) could be loaded into
any database directly, without parsing xml (which typically means
either 1) using the jmdict loader in the JMdictDB software which
is limited to Postgresql, 2) writing one's own xml parsing code,
a non-trivial task, or 3) using a generic xml-to-sql converter
which typically results in a very poor schema.)
The tradeoff made by loading from csv or sql is that the database
schema loaded into is predetermined. The tables and their columns
all must exist and be compatible with what the csv/sql was generated
from, which is course the schema defined by JMdictDB.
>> Such a format will be tied to a specific (version of a) database
>> schema. Of course that schema is publicly available but would
>> need to be advertised along with the .sql formated files. It is
>> also quite Postgresql-specific and if .sql (or .csv) files were
>> published, it would be desirable to publish a version of the
>> schema with the postgresql-specific bits elided.
>
> That really goes to the heart of it. If an "sql" file differs between
> Postgresql, MySQL, etc. then I don't think we should consider making
> one available, as we'd be heading into all the issues that are
> avoided by just publishing in XML.
The distributed sql (or csv) file containing the dictionary data
would be loadable, unchanged, into any common database. But the
database would need to have all the tables pre-created, with the
expected columns and datatypes for those columns before loading
the data. The JMdictDB script that creates those tables is what I
meant about being very Postgresql-specific. One would need to do
a lot of editing to use it to create the tables in a MySql database
for example.
But if all one wanted was to create tables suitable for loading
sql or csv jmdict data into, without regard for using the database
with the rest of the JMdictDB code (which is what requires all
the Postgresql bells and whistles), then a more generic sql script
for creating the schema could be written using only standard sql
which could be used on most databases with few or no changes.
Maybe someone (other than me :-) would be interested in doing that.
But then there would be little point unless the jmdict data were
going to be distributed in sql or csv format.