[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [edict-jmdict] mysql limitations



Jim Breen wrote:
> [Stuart McGraw (RE: [edict-jmdict] mysql limitations) writes:]
> >> Jim Breen wrote:
[...]
> >> > I guess all other things being equal, having efficient subquery processing
> >> > is a Good Thing(TM), but if guru-level activity is going to be sparse,
> >> > performance with subqueries may not be a huge issue.
> >> 
> >> But subselects are not guru-level activity.
> >> Please believe me, anything that comes from me is assuredly 
> >> not guru level!!
> >> 
> >> The reason I was interested in the query I posted is that it provides
> >> a concise look at an entry.  If you've played with the schema at all 
> >> you have undoubtedly discovered that looking at a table and seeing a 
> >> bunch of id numbers is not very enlightening.  If you are debugging an 
> >> app, query, stored procedure, whatever, and it is doing something with 
> >> entry 40877 it is nice to know what that entry is (in term of the kanji, 
> >> glosses, etc associated with it).  The query lets you see the entry, 
> >> and its kanji, reading, and gloss strings all in one row.
> 
> It's an interesting process. You have to do an elaborate deconstruction
> of an entry to get it into all those tables in order to give you the 
> flexibility and scope needed, and then you need to do an equally
> elaborate reconstruction to get it back to an entry you can see as whole.

<Chuckle> yes, I hadn't thought about it that way before. :-)

> I guess that's the up-front pain of moving to a DBMS.

There's some pain with any change.  And if one is used to 
programming, especially in a procedural language like C, 
SQL (which is basically a functional language that operates 
on sets) can seem pretty foreign initially.  At least it 
was that way for me. 

One of the biggest payoffs with a database system is 
data consistency.  The database enforces rules that 
prevent inconsistent data from ever getting in there.  
For example, just the exercise of loading jmdict into 
a database identified a lot of minor things.  I found 
a number of entitles that were used inconsistently.  
Someone else (Ronan?) posted a list of xrefs without 
valid targets.  My schema doesn't provide for having 
an entry marked "re_nokanji" if the entry has no kanji.  
As a result I found entries 2066740, 2067160, 2067300,
and 2067680 all have this condition.  All of this 
consistency checking helps prevent the "bit-rot" that 
tends to afflict large collections of data.

> As I have been watching the DBMS discussion unfold, I have been pondering
> the suggestion from someone a while ago (it may have been off-line) just
> to drop a complete entry into a text window and let someone edit it.

Instead of a database-based solution?

> For example, replacing my simple markup with XML-like labels, the
> entry: 半々(P); 半半 【はんはん】 (n) half and half; fifty-fifty; (P) 
> is stored:
> 
>     <entry> 1601210
>     半々
>     <k_pri> news1 nf18
>     半半
>     <reading>
>     はんはん
>     <re_pri> news1 nf18
>     <sense>
>     <pos> n
>     half and half
>     fifty-fifty
> 
> I guess for the average user about to fiddle with an entry,  having things
> in defined fields is lot more understandable,and easier to check.

Of course, one can have a database behind this 
interface as easily as a file.

One nice thing about forms is dropdown/combo/-
selection boxes.  Without those one needs a pretty 
extraordinary help system in order to let the casual 
user know quickly that they need to type "adj-na" 
rather than "na-adj" or "adjna".   Or the parser has 
to be very heuristic.

I built a task tracking system once that worked like 
that.  Task priorities, status, notes, assignments, 
etc were all displayed in a big window as formatted 
text.  I used it for a long time by myself and liked 
it very much but when other people started to use it, 
they got frustrated at syntax errors when they submitted 
an edited entry and ended up making me rewrite it as a 
form based system.  I still wonder if better help, error 
messages, and parsing could have made it viable. 

I am currently in the (very slow) process of moving the 
front-end of my personal jmdict database app from MS 
Access to Python/wxWidgets.  My motivation was that MS 
Access forms, although very fast to build a gui with, 
have a lot of limitations (that and I can't stand Visual 
Basic).  What I want something like what you suggest, 
but read-only, not for editing.  I want a big text window 
in which an entry (or multiple entries) appear formatted 
much as you would see in a paper dictionary.  I find 
that much easier to read than a form, and a much more 
efficient use of screen real estate.  As for editing, 
that is still on the drawing board. I hope I will get 
some ideas from how its done for jmdict.  :-)