[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] <xref> tags without a destination or with too many destinations
On 30 October 2010 07:59, Glenn Maynard <glenn@zewt.org> wrote:
> On Fri, Oct 29, 2010 at 12:10 PM, Jean-Luc Léger
> <jean-luc.leger@dspnet.fr> wrote:
>>> <xref type="see" seq="1594530" sense="B">仕舞い・1</xref>
>>>
>>> that "1" would probably have to come from a reference to the target
>>> entry to find out what number sense-B is today.
>>
>> As "sense-B" can give me its order, then "・1" is useless.
>> That sense-B should look like this :
>>
>> <sense id="B">
>
> I recommend:
>
> <sense id="4783590">
> <xref type="see" seq="1594530" sense="4783590">仕舞い</xref>
>
> where 4783590 is a unique number, for the reasons I mentioned earlier.
This is the basic idea which I had come up with for the database I
made; in the original attachment at the start of this discussion, the
numbers at the start of each column (the numbers I said to ignore) are
the unique sense numbers which I assigned reading sequentially through
the file. I think what Glenn says is the right approach, rather than
numbering the senses relative to the parent <entry>.
> It also means that reverse lookups can be done more easily (finding
> all xrefs to a sense).
I don't know of any software which does these reverse lookups. What
are you thinking of specifically?
> I also agree with removing the ad-hoc dotted
> separator; encoding data is XML's job.
Yes, the dotted separator in the xref data is a tragedy; if you have
tried to write a code which can parse that list of dotted separators
and decide whether the first thing is a reb, a keb, or something, the
second thing is a number or a reb, and so on, then try to look up the
corresponding element, then you will know that. I am guessing that a
lot of people didn't even try to jump this hurdle otherwise a lot more
noise would have been made about the <xref>s before now. If I didn't
have a fair amount of experience of trying to parse Japanese text I
probably would have given up too. As it is the parser I've written is
fragile.