[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Errors in cross-references



Hi again,

while I was trying to resolve cross-references in JMdict, I found three references referring to JMnedict entries:

1. <xref>フィレオフィッシュ</xref> in seq 2137630
2. <xref>NHK</xref> in seq 2176930
3. <xref>タミフル</xref> in seq 2648970

(That would be OK if there was a way to mark them as referring to entries in another dictionary. However, as there are only three such cross-refererences, it seems more likely that it's just an error.)

Additionally, there are two references that are repeated twice within the same entry and sense:

1. <xref>豚トロ・とんトロ・1</xref> is repeated twice in seq 2677780 (and the sense number is unnecessary) 2. <xref>無線呼出符号・1</xref> is repeated twice in seq 2827567 (and the sense number is unnecessary)

I have also noticed that there are about 4000 cross-references that specify a reading although the target entry has only one.

Last but not least, in the references to the following entries, the use of the centre can be misleading for the parsing software (and the DTD disallows it: "The target keb or reb must not contain a centre-dot."):

1. シルキー・シャーク
2. シンガポール・スリング
3. カーゴ・スリング
4. ベイビー・スリング
5. マヌカ・ハニー
6. タックス・ヘイブン

Maybe the best solution would be for the xref to contain only XML-structured information (seq, type and optional restriction to a particular sense/kanji/readings). As for the restriction to kanji/readings it could be done in much the same way senses can now be restricted using stagr/stagk along these lines:

<!ELEMENT xref ((xtagk*, xtagr*)|xtags*)>
-- or if only one kanji/reading/sense is enough: <!ELEMENT xref ((xtagk|xtagr|(xtagk, xtagr)|xtags)?)>

<!ATTLIST xref seq CDATA>
<!ATTLIST xref type CDATA #IMPLIED>
	<!-- Type of cross-reference, implied value "see".>
<!ELEMENT xtagk (#PCDATA)>
<!ELEMENT xtagr (#PCDATA)>
<!-- These elements, if present, indicate that the cross-reference is restricted to the lexeme represented by the keb and/or reb of the entry identified by xref's
	seq attribute. -->
<!ELEMENT xtags (#PCDATA)>
<!-- These elements, if present, indicate that the cross-reference is restricted to particular senses (represented by their numbers) of the entry identified by
	xref's seq attribute. -->

--
Adam Nohejl