[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Diacritics in lsrc

To: edict-jmdict@***************
Subject: Re: [edict-jmdict] Diacritics in lsrc
From: Ben Bullock <benkasminbullock@*********>
Date: Tue, 6 Jul 2010 16:53:09 +0900

On 6 July 2010 16:30, Jeroen Hoek <mail@jeroenhoek.nl> wrote:

> Yeah, that should use the same Unicode data as a source.

The Unicode data keeps changing from version to version, so you need
to check details if you care about it.

On 6 July 2010 16:29, Jim Breen <jimbreen@gmail.com> wrote:

> I see the old Suns have Perl 5.6.1. If I get nowhere with the Python
> upgrade, I'll see if I can stir the Perl along.

I am very sorry but the thing I posted has not even a tiny hope of
working on Perl 5.6, you would need to investigate some legacy modules
in that case.

On 6 July 2010 16:38, Jeroen Hoek <mail@jeroenhoek.nl> wrote:
> On 6 July 2010 09:29, Jim Breen <jimbreen@gmail.com> wrote:
>> Interesting, and a bit of a shock to see it also whips the にごり
>> marks off kana as well. (It left the dots in the tops of "i" though  8-)})
>
> Logical if you think about it; nigori are diacritic-ish.

That sample program is just blanket-applying the Unicode decomposition
of each character to strip out anything which can be stripped out.
There is lots of support for pattern matching to select or exclude
sets of characters if you need it.

References:
- Diacritics in lsrc
  - From: Jeroen Hoek <mail@*************>
- Re: [edict-jmdict] Diacritics in lsrc
  - From: Jim Breen <jimbreen@*********>
- Re: [edict-jmdict] Diacritics in lsrc
  - From: Darren Cook <darren@*********>
- Re: [edict-jmdict] Diacritics in lsrc
  - From: Jim Breen <jimbreen@*********>
- Re: [edict-jmdict] Diacritics in lsrc
  - From: Jim Breen <jimbreen@*********>
- Re: [edict-jmdict] Diacritics in lsrc
  - From: Jeroen Hoek <mail@*************>
- Re: [edict-jmdict] Diacritics in lsrc
  - From: Jeroen Hoek <mail@*************>
- Re: [edict-jmdict] Diacritics in lsrc [1 Attachment]
  - From: Jim Breen <jimbreen@*********>
- Re: [edict-jmdict] Diacritics in lsrc
  - From: Ben Bullock <benkasminbullock@*********>
- Re: [edict-jmdict] Diacritics in lsrc
  - From: Jeroen Hoek <mail@*************>

Prev by Date: Re: [edict-jmdict] Diacritics in lsrc
Next by Date: Re: [edict-jmdict] Re: Database testing - call for testers
Previous by thread: Re: [edict-jmdict] Diacritics in lsrc
Next by thread: Cutting over
Index(es):
- Date
- Thread