[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] pos/misc entities



Glenn Maynard wrote:
A lot of people use this data.  Am I the only person having trouble
with this?

No, I have too.

In my case though, I always map the JMdict entities to
application-specific values since I want to decouple my
application code from changes in JMdict.  Since I do that
mapping anyway, it doesn't really matter if the input is
the entity strings, or their expanded values.

I once tried to get Python's ElementTree XML parser (and
Expat which it uses to do the low-level parsing) to use
a custom entity resolver but never got it to work.

What I've done lately is to pass to ElementTree my own little
custom file reader object.  The reader pre-possesses each XML
file line before feeding it to the parser and strips the "&"
and ";" from all but the standard entities (& and friends)
so the xml parser sees, for example, "<pos>adj-i</pos>" rather
than the entity.  Bit of a hack but seems to be working ok.