[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [edict-jmdict] pos/misc entities
Glenn Maynard wrote:
A lot of people use this data. Am I the only person having trouble
with this?
No, I have too.
In my case though, I always map the JMdict entities to
application-specific values since I want to decouple my
application code from changes in JMdict. Since I do that
mapping anyway, it doesn't really matter if the input is
the entity strings, or their expanded values.
I once tried to get Python's ElementTree XML parser (and
Expat which it uses to do the low-level parsing) to use
a custom entity resolver but never got it to work.
What I've done lately is to pass to ElementTree my own little
custom file reader object. The reader pre-possesses each XML
file line before feeding it to the parser and strips the "&"
and ";" from all but the standard entities (& and friends)
so the xml parser sees, for example, "<pos>adj-i</pos>" rather
than the entity. Bit of a hack but seems to be working ok.