[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Deleted entries markers absent from JMdict files



On 01/23/2011 03:46 PM, Jean-Luc Léger wrote:
> On Sun, 23 Jan 2011 10:34:57 -0700, Stuart McGraw <smcg4191@frii.com>
> wrote:
>> On 01/21/2011 10:51 PM, Glenn Maynard wrote:
>>> On Fri, Jan 21, 2011 at 11:53 PM, Stuart McGraw <smcg4191@frii.com>
>>> wrote:
>>>> Alternatively, an dedicated tag could be used:
>>>>
>>>> ...
>>>> <audit>
>>>> <upd_date>2011-01-20</upd_date>
>>>> <upd_merged>1234560</upd_detl>
>>>> </audit>
>>> 
>>> I'd strongly recommend the above over the other, because it doesn't
>>> require additional parsing.  FYI, I'd recommend:
>>> 
>>> <audit type="merge" id="1234560" date="2011-01-20" />
>> 
>> On the other hand, although the other alternative:
>> 
>> <audit>
>> <upd_date>2011-01-20</upd_date>
>> <upd_detl>Entry 1234560 merged</upd_detl>
>> </audit>
>> 
>> does require parsing, only those apps that care about tracking merges
>> need to do so -- to apps that don't care it remains a normal history 
>> comment that can provide human-visual information about the merge with
>> no change to the app.
>> 
>> Don't know the ratio of merge-tracking apps to non-tracking ones so 
>> it is hard to know how to weight the two alternatives.
>> 
>> And as I said in another post, both are second-rate alternatives to
>> providing more formal and explicit tracking in the database to prohibit
>> inconsistencies.
>> 
> 
> Some thoughts about the whole thread :
> 
> 1) I don't think there are many applications using the upd_date/upd_detl
> fields or their maintainers would have come to us asking why there are
> several amendments a day for so many entries. 

Agreed, but I also think that for every user that will post a message 
about something, and are N times more who will suffer silently.  As to 
the value of N, I suspect it is fairly large.  But since only one person
raised the disappearing merge comment issue, N * 1 may still be small 
in absolute terms. :-)

> When in facts, it is just
> that every comments, every non-commited changements of an entry are
> recorded in the XML file. I expect people really using these fields would
> have suggested that we record only approvements.
> Take a new entry submitted in December but approved only in January. Which
> date one would expect to find as the date of creation ? December or January
> ? Date of submission or date of apparition in the XML file ?

Well, currently I think the creation date would be December.  Since status
change (unapproved -> approved) information is recorded in the database
history records, it would be possible to include that data in the xml allowing
the user to choose the date of interest.

> 2) If breaking the existing format is really a problem, you can always
> imagine a separate file with those merge informations.

Yes, that is an interesting option I hadn't thought of.  However, I think
the issue can be handled while maintaining xml compatibility as in my just-
mailed response to Glenn.

> 3) Merging entries is not the only type one application would want to
> track. Splitting an entry into 2 or more, or merely moving informations
> from one entry to another, would be interesting to track.
> Now, that's probably complicated to manage but before choosing a solution
> for the merge problem make sure you will not have to break everything
> when/if the split/move problem is resolved.

Splits could be handled similarly to merges, that is, a history comment
in the database like, "split from 1234560".  Similar to my proposed 
merge-edit page, a split-edit page could have two sets of edit boxes,
one populated with an existing entry, the other blank to be filled in with
data extracted from the first set.  The changes made to the first set of
boxes would result in an edited version of the original 1234560 entry, 
the second set of boxes would produce a new entry with the "split from
1234560" comment.

Of course this would only handle a two-way split.  I suppose you could 
then split one of the resulting entries a second time although admittedly
this could be tiresome in N>2 splits occur often.

This approach will have the same (potential lack of) consistency issues
as the merge proposal.

Another problem may be educational -- getting people to use the merge and
split pages rather than doing merges and splits by editing the two entries 
separately with the standard edit page.

Disclaimer: the above is off the top of my head.  I agree it needs to be
better thought out or implemented prior to putting the merge stuff into 
production.