[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Unresolvable cross-references in JMdict



Go busata shite imasu...

I recently resurrected my efforts to get JMdict up and running on my
website, so I might be dropping in here occasionally for the next few
weeks.

I'm almost at the stage where I can turn the JMdict source file into a
mySQL database, but the latest version of the XML source file has a lot
of broken cross-references which are causing problems.

According to the DTD comments, the contents of a <xref> and <ant>
elements "must exactly match that of a keb or reb element in another
entry". But a lot of the xref elements are of the form
"<xref>å½¼ã?»ã??ã??</xref>" (1000420) with the reb and keb data
separated by a nakaguro character. I can rig my software to ignore these
nakaguros and everything after them, but should this really be
necessary? Here's the list of entries containing xrefs of this sort:

1000420, 1001810, 1002500, 1002770, 1005160, 1006780, 1007030, 1011770,
1140360, 1154340, 1157230, 1169250, 1179005, 1191740, 1193080, 1198910,
1206100, 1231650, 1233560, 1238070, 1241450, 1249300, 1269080, 1270510,
1304960, 1328030, 1334990, 1352150, 1373250, 1374430, 1375190, 1392410,
1392580, 1430580, 1430580, 1457730, 1488250, 1495740, 1531190, 1551190,
1557840, 1557840, 1576075, 1578780, 1578780, 1578850, 1581730, 1582305,
1595120, 1620400, 1687910, 1811520, 1817160, 1855060, 1856710, 1858070,
1876230, 1900910, 1912520, 1927050, 2002660, 2006190, 2008930, 2018520,
2038230, 2057560, 2059710, 2063050, 2076920, 2080210, 2080220, 2081850,
2082140, 2084040, 2088480, 2096660, 2107350, 2110550, 2116670, 2126090,
2129880, 2138820, 2139720, 2142690, 2146460, 2147610, 2150170, 2158540,
2183270, 2200990, 2202770, 2202980, 2203220, 2206390, 2207590, 2208340,
2209690, 2209800, 2210370, 2210770, 2210780, 2211020, 2213500, 2219070,
2220300, 2221990, 2222000, 2222010, 2222180, 2222340, 2224460, 2227920,
2229880, 2229970, 2230270, 2230290, 2233080, 2239830, 2240170, 2240210,
2244870, 2247540, 2247760, 2247800, 2247820, 2247840, 2247950, 2248010,
2248030, 2248110, 2248130, 2251170, 2254210, 2254220, 2254230, 2254240,
2254250, 2254260, 2254270, 2254280, 2254290, 2254360, 2256880, 2258110,
2258150, 2258170, 2258540, 2259830, 2260900, 2261050, 2261080, 2262180,
2262270, 2262620, 2263630, 2264640, 2264650, 2265230, 2266150, 2266990,
2267210, 2267700, 2268150, 2268290, 2268750, 2269170, 2269820, 2270820,
2272810, 2272930, 2273430, 2275840, 2386360, 2394540, 2396350, 2397090,
2399550, 2399550, 9001470

The antonym cross-reference for entry 1414830 is unhelpful, not least
because there is no keb element corresponding to "å°?é?¨":

<ant>å°?é?¨</ant>
<ant>(of</ant>
<ant>a</ant>
<ant>book,</ant>
<ant>etc.)</ant>
<ant>lengthy</ant>

Entry 2210320 uses a zenkaku semicolon instead of a nakaguro:
<xref>ã?¾ã??ï¼?ã?ªã??</xref>.

Entry 2234570 has two <xref> elements: <xref>å¯?ç??</xref> and
<xref>sense-2</xref>. Obviously the second of these doesn't match
anything, although I can see what was intended. Maybe there's a better
way of associating cross-references with particular keb/reb/sense
combinations?

Some of the cross-references just don't seem to go anywhere:

ã?³ã?³ã??ã?¤ã?³ã??ã?¼ã??ã?¹ã?¿ã?¼ (1052990)
����� (1160100)
以å`?æ³¢æ­? (1233320)
é?'æ¯? (1381380)
æ?¯ã??å??ã?? (1404320)
å??è?®ç?® (1460500)
å`¼ã?³æ¨? (1574790)
å¿?ã??ã??ã?°ã??ã?? (1640050)
ã??è²·ã??ä¸?ã?' (1752850)
大�天 (1786600)
����� (1788820)
è??ã?®ã??ã?? (1860340)
ã??ã?¥ã?¼ã?¯ã?ªã?¢ã??ã?¡ã??ã?ª (2072660)
é?·ã??ã?? (2078690)
å?£æ?°ã?®å¤?ã??ã?? (2088160)
è¨?ã??ã??ã??ã??ã?ªï¼? (2208130)
太宰治 (2221040)
�����止 (2229140)
ç?»è?? (2231420)
ã??ã??ã??æ?ªã?? (2238320)
赤家è?? (2240890)
æ??å¾?ã?¡ (2244940)
ç"?ç"£å·¥ç¨?è?? (2248540)
ã?«ã?"ã?«ã?" (2255280)
å??é¡?è¦?,ç?`å??è¦? (2257780)
é»'ã?"ã?¼ (2258930)
æ±?ã? ã?? (2267130)
ç?¼ã?`ã??ã?¦ (2267330)
ä¸?é¨?å½"å??ã?®æ­¦è?? (2267900)
å­?å®?ã?? (2268160)
ã??ã??æ³¢ (2268910)
äº"å¢? (2270420)
御�� (2271360)
御�� (2271370)
å?ºè?¥ã?? (2275150)
正�形� (2348450)
ä¼?æ?"æ?§ç´?æ?` (2396630)
è?³å?? (2397080)
æ??ã?®å±?ã?? (2400050)
é??æ??æ§? (2400960)
深山å«?è?? (2401120)
è??è??é?? (2401260)

Sorry about the long post, but I hope this comes in useful.

Phil Ronan // japanesetranslator.co.uk