[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] Re: [abbr=...] (Abbreviation cross-references?)
On 09/02/2011 07:51 PM, Jim Breen wrote:
> On 31 August 2011 13:36, Stuart McGraw <smcg4191@frii.com> wrote:
>
>> I have not looked at the code for a while so I reserve the
>> right to retract this, but I think the xref/abbr change
>> won't require any changes to the database structure, just
>> the code and db contents will need changing.
>
> I suspected that would be the case, however to carry it cleanly
> through into JMdict I'd need to change the DTD and generator,
> and if there are a few such changes in the pipeline it would be
> best to batch them.
A lot of the proposed xml changes IIRC would affect the
xml only, not the database. And database changes are not
not a big problem -- besides the changing the definition
files, I also create "upgrade" files to migrate an existing
database to the new version and there aren't a large number
of external users of the database AFAIK.
So the limiting factor is probably the desire not to change
the dtd too often (which I understand). Perhaps looking at
how often the dtd has been changed in the past and what
changes were made each time could give guidance as to when
another change is acceptable and how big/small it can be?
>> Not specifically for DB changes (that is, changes to
>> the definitions of the tables, views, and other objects
>> in the database) but there is a list of outstanding
>> tasks:
>>
>> http://www.edrdg.org/~smg/jmdict/TODO.html
>>
>> In some cases is it not clear without further thinking or
>> sometimes experimentation, whether a db structure change
>> will be needed to complete a task.
>
> I/we should go through the list, esp the high/medium ones
> and select a batch to concentrate on.
If you can give some feedback as to which ones you'd like
to see prioritized, that would be useful. I'll see if I
can summarize the ones on my list, and scan past list posts
for discussions. I recall I was particularly keen to see
revision of the xref tags (which would include the xref
abbr type) because it would eliminate the need for resolving
xref targets heuristically and I could get rid of a lot of
very complex and non-confidence-inspiring import code.
Perhaps by focusing on a set of changes there is mostly
agreement on, a large enough set of changes can be found
to justify a dtd change?
I fear that if we seek a too big "super-update", then there
will never be agreement on all the details, and nothing at
all will ever get done.
> One feature I'd like to see is an email notification associated with
> edits, either automatic (if X has proposed a change to an
> entry, then they get emailed the edit history, either automatically
> or on the request of an editor. I often email people directly to
> advise the outcome of an edit or to ask for further information, and
> having this (semi)automated would be great.
That's pretty doable I think. (But see my comment below re
web frameworks.) A problem (which is also a problem for the
url on the submission "thank you" page, is that a link to a
specific entry (which one would want to send in the email)
may become invalid if the entry in approved, and a link to
the edit tree as a whole can make it hard to find one's entry
if there is much branching. A to-do item is to find a way of
presenting edits without all the comment duplication that is
present in the current "updates" pages. Whatever solution is
found here would be applicable to the urls used in email
responses.
>>> 2. in general, on what time scale do we expect the DB format
>>> to change? Annually? 5 years? 20 years?
>>
>> DB is changed whenever needed -- there is no particular
>> schedule although obviously it is not something done casually.
>> When a change is made, the scripts that create a new database
>> are changed, and a sql patch file (of set of same) are made
>> that will update an existing database to the new structure.
>> This hopefully keeps things working and in sync whether
>> installing the software for the first time, or updating an
>> existing install (as the wwwjdict submission system is.)
>
> One year in, it's working pretty well from my point of view. Of course
> it depends on some dedicated editors.
I am working an an fully-automated AI editor but I still
have a few bugs to work out. :-)
>> Of course, all these considerations are modulo the time
>> needed from Jim, and availability of same for him.
>
> Naruhodo.
>
> One thing I have noticed it that we often get into lengthy
> discussions via the comments on the entries. I think these
> form a very valuable record, and it's great to have them there.
> I think they'd be even more valuable if they were more visible
> to the community. Some other dictionary systems have a front
> page with summaries, recent additions, recent comments, etc.
> Something like that sitting at the front of the system would
> be great. At present the raw functionality is great, but the PR
> aspects are not so clear.
Agreed. I did not appreciate the social networking aspects
of it when I came up with the original design. I was
envisioning something like a simple source code control
system with the comments being terse rationales for the
changes made in an edit, not the sort of discussion board
it seems to be being used as.
(As an aside, since the comments and references *are* generally
useful, I would be happy to see them distributed in some form;
if not in the xml, then perhaps as auxiliary files. Information
accessible only from someone's web site puts the information
somewhat at risk.)
One of the things I've being wondering about, slightly looking
into, is redoing the web pages with some kind of web framework.
Such a framework would have a lot of features like authentication
and sessions that I've implemented in a half-assed way.
Probably email responses would be another feature so we might
want to look into frameworks before spending a lot of time
implementing email responses by hand first.
A mixed blessing with Python is that there in no canonoical
package which means one can pick among a bunch of contenders
with different strengths and weaknesses but that also means
a big time commitment to evaluate them in depth.
Another possibility would be to go up another level and use
some kind of prebuilt discussion forum / social networking
package into which the code for the database updates could
be integrated. However, I have no idea what the options are.