[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

editorial policy?



Hello everybody,

about 8 years ago i became aware of JEDICT and started using it, and over the years, from time to time, i have sent Jim comments and suggestions in regards to specific entries. Last year i joined this mailing list, and although most of what is being talked about here is over my head (i don't deal with databases or prgramming), i appreciate the concerted effort aimed at improving this resource in many ways.

Improving the interface in response to new insights or changing user habits, adding lookup methods (additional features) that make better use of, or allow for different use of, the existing data, and provding convenient channels for submissions, are all improvements that i appreciate and, where they require a new way of doing something on the part of users, are more than willing to adjust to, since i am primarily a user of this free resource.

I must tell you of my growing concern, however, that the integrity of the data itself aappears to me to be suffering in a way that has perhaps not been anticipated with all these welcome improvements.

Probably everbody on this list is aware of the fact that creating a dictionary, or even just a glossary, is more than collecting data and indexing them in various ways - one essential part of the process is to verify the validity and usefulness of the data from a linguistic perspective and to present them in ways that make sense to the users. But who validates the data and how, by what process and under what criteria, are they validated? And who checks with users as to their assumptions, expectations, and observations in regards to the comprehensiveness and reliability of the data being presented in response to a query?

Perhaps as the consequence of my own improvements in regards to using the Japanese language i feel increasingly frustrated by the disparities i find between EDICT and other online dictionaries, starting with WaDoku, which i can access through the same interface. These disparities are in some cases substantial differences in the basic meaning of terms being offered, in other cases incomplete entries, and in many cases simply missing entries of items that i encounter in contexts that one could not exactly call obscure. This is the first issue that i think warrants some editorial attention: a comprehensive expansion of the database instead of the piecework done by all the volunteers (see my further comments on that near the end).

Another point of concern that has to do with the recently established submission interface and the associated possibility of anybody "live-updating" the database: for several reasons i am ususally not in a position to submit (assumed) corrections or updates at the time i notice a (presumed) error or missing information but instead tend to collect such "odd entries" for later processing and submission. Counterintuitive as it may seem to some, since the database can be "live-updated" i don't feel much inclined anymore to write out and submit corrections, because the database is liable to change daily, and in order to avoid confusion and inconsistencies, i would have to check a second time in each case whether the point i want to submit is actually still needed. 

And one more issue: i have difficulties getting my head around the enormous range of inconsistencies that i encounter when using two different functions of the JEDICT interface, one being the kanji lookup which appears to make use of a file called "glossdic" and the other one the word lookup which appears to make use of a file called "edict". I am not sure what technical issues are at hand here, but from the point of view of a user and from the point of someone with a background in linguistics and translation i find these inconsistencies disconcerting.

The next to last item on my list (and i know this is being actively addressed as we speak) is the Tanaka corpus. I find the copious repetitions of sentences with only insignificant differences quite annoying and also take issue with many of the translations proffered - this is apparently nothing new to the members of this list, but it is really a problem that needs to be addressed by linguists and not by code or database specialists. In the meantime, whenever i want to get a handle on usage questions, i use the much more concise and better translated corpus from EIJIRO and the Yahoo and Kenkyusha dictionaries.

And then the last point: are there any native speakers of Japanese on this list? :-) I am asking because i don't recall having seen any post yet by anyone who one might presume to be a native speaker, perhaps by virtue of their name. In any case, it would seem obvious to me that there should be native speakers of _both_ languages among the editors of any bilingual glossary or dictionary.

Considering that all the work being done is done by volunteers, it feels almost cheap to complain about anything, but i think that all who put in their time and skill would hate to see their work being nullified by an increasing erosion of the quality of the data itself. And although i cannot contribute much to the kind of work being undertaken by said volunteers, i hope that my concerns and suggestions are seen as they are intended, as constructive contributions to the creative process, not as mere complaints. And since it is also quite likely that the points i have raised here have previously been addressed in some other context, i would like to see this collection of ideas interpreted as a kind of spotcheck summary from one user's perspective, completely without prejudice or insinuations of any kind.

Some suggestions:

- We need to get linguist volunteers on board who will do editorial work

- We should stop the "live-updating" of the database until linguist editors are at hand to implement such updates based on a (yet to be established) linguistically consistent editorial policy - i honestly don't think that a Wikipedia approach is the best approach for a dictionary as a whole, although having a Wiki section attached to an exisiting dictionary would seem very useful

- As i had mentioned, i see a need for a comprehsensive expansion of the database, but what i don't know is whether JEDICT is the only non-proprietary database of this kind - if it is, then the needed data will likely not become easily available, and one  may have to live with the shortcomings of what one has. But if there are other non-propriatary databases, it would seem very desirable to actively aim for a cooperaton with those people who have their hands on other databases with the objective of infusing needed new data into the current database in a comprehensive way (all done with editorial oversight, of course)

* * *

Here is one illustrative example of where a linguistic editor's hand is needed and useful:

割り勘(P); 割勘 【わりかん】 (n) (See 割り前勘定) (abbr) splitting the cost; Dutch treat; (P)
割り勘負け; 割勘負け 【わりかんまけ】 (n) (See 割り勘勝ち) (sl) "loser" of a meal paid for by dutch treat (i.e. the person who eats the least)   
割り勘勝ち; 割勘勝ち 【わりかんがち】 (n) (See 割り勘負け) (sl) "winner" of a meal paid for by dutch treat (i.e. the person who eats the most)   
割り前勘定; 割前勘定 【わりまえかんじょう】 (n) (See 割り勘) each paying for his own account; sharing the expenses; Dutch treat

Some comments from a linguistic perspective regarding the above entries:

1) "see ....." usually does not mean "opposite meaning" or "different meaning"; for that one should better use expressions like "contrast with" or "opposite"

2) Someone needs to decide what "Dutch treat" really means: Is it "each [person] paying for his own account" or is it "splitting the cost [evenly] / [(if more extensive:) of a meal evenly among the guests]", which is the same as "sharing the expenses [equally / in equal parts] / [(if more extensive:) of a meal equally among the guests]"?

3) Someone needs to decide whether the headword in the translated part will be "Dutch treat", which is a colloquialism - considered inappropriate, if not offensive by some people, thus better given a suitable marker -, or the plain language denotation of 割り勘 and 割り前勘定 and so on.

As i just suggested, these are not issues to be left to people whose speciality is coding, database management, or interface maintenance, but, as important and valuable the latter skills are in the overall process of increasing the useful ness of this database, such issues are the task of people whose speciaities might be linguistics, translation, language teaching, and the like.

* * *

Many thanks to Jim and everybody else, and best regards from Okinawa: Hendrik

--


*   南風言語業(大工ヘンドリク)  *
http://www.paikaji-translation.com/

--