[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Jreibun, new sentences database
> Hi everyone, Kim from Jisho.org here.
Hi there,
I'm not dead. Although I'm not much healthier either.
> Professor Suzuki Tomomi at Tokyo University of Foreign Studies, together
> with a research team, is embarking on a project to create an open database
> of high quality Japanese-English example sentences geared towards Japanese
> learning apps and sites.
>
> The project is based on research done at TUFS around study resources used
> by Japanese learners.
>
> To quote the project:
>
> 日本語学習者の使っている辞書アプリを見て、その例文をもっといいものにしたいと思ったことはありませんか。本科研では、アプリ・ウェブサイト開発に使ってもらえるよう、
> 日本語教育の観点から見た質の高い例文バンクを作成し、オープンデータとして公開します。
>
> They are doing their first seminar for the project over Zoom on Sunday July
> 18 at 1pm (Japan time) to explain the background and aim of the project,
> and to solicit volunteers. I’m attaching the event flyer. Signups close on
> July 10th.
>
> Hopefully this project will be useful for many of you on this mailing list,
> and if it sounds interesting I encourage you to join the seminar.
>
> My connection to this project comes from studying under Suzuki many years
> ago, and helping out in the early planning stage of the project by
> answering technical questions around how Jisho uses example sentences from
> the Tanaka corpus.
>
> I have high hopes for Jreibun, and have already agreed to use the sentences
> in Jisho. I'm also considering creating an open set of JMdict-Jreibun
> mappings similar to the "good sentences" (the ones marked ~) in the Tanaka
> corpus.
I'd be interested to know how you/they are going to be dealing with
linking words and senses to actual example sentences. One of the
things I dealt with when I had more energy was updating on a monthly
basis the link between jmdict and example sentences, so I know some of
the 'gotcha's waiting for you.
The theoretical ideal is that every part of the Japanese example
sentence would be linked to a unique word/phrase in the dictionary, or
would be explicitly excluded as 'white space / junk text'. In a sort
of Orwellian "Everything not forbidden is compulsory" sense. Keeping
track of junk text / white space was something that I did in my (sadly
obsolete) files which wasn't really handled elsewhere and is useful to
avoid spurious matches (which can lead to the wrong words being
highlighted in example sentences and other problems).
One point that was a lot of trouble to work with, and not handled as
well as it could have been, was tracking when words in the dictionary
had new senses added, merged, or removed. I would recommend being
aware of this issue from the start as it happens a lot.
Best regards,
Paul Blay
--
You received this message because you are subscribed to the Google Groups "EDICT-JMdict" group.
To unsubscribe from this group and stop receiving emails from it, send an email to edict-jmdict+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/edict-jmdict/CANNhP54rovgEt_keqG55xyru-J07mhoSWYF-scAZ6tp6sTZUyQ%40mail.gmail.com.