JMdictDB - Entries

jmdict 1385120 Active (id: 2204821)
<entry id="2204821" stat="A" corpus="jmdict" type="jmdict">
<ent_corp type="jmdict">jmdict</ent_corp>
<ent_seq>1385120</ent_seq>
<k_ele>
<keb>切断</keb>
<ke_pri>ichi1</ke_pri>
<ke_pri>news2</ke_pri>
<ke_pri>nf25</ke_pri>
</k_ele>
<k_ele>
<keb>截断</keb>
<ke_inf>&rK;</ke_inf>
</k_ele>
<k_ele>
<keb>接断</keb>
<ke_inf>&sK;</ke_inf>
</k_ele>
<r_ele>
<reb>せつだん</reb>
<re_pri>ichi1</re_pri>
<re_pri>news2</re_pri>
<re_pri>nf25</re_pri>
</r_ele>
<sense>
<pos>&n;</pos>
<pos>&vs;</pos>
<pos>&vt;</pos>
<gloss>cutting</gloss>
<gloss>severance</gloss>
<gloss>section</gloss>
<gloss>amputation</gloss>
<gloss>disconnection</gloss>
</sense>
<info>
<audit time="2014-01-27 11:43:22" stat="A" unap="true">
<upd_uid>jwb</upd_uid>
<upd_name>Jim Breen</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_detl>Merging 1567110.</upd_detl>
<upd_refs>Koj, Daijr, Wadoku (all merge).</upd_refs>
<upd_diff>@@ -8,0 +9,3 @@
+&lt;/k_ele&gt;
+&lt;k_ele&gt;
+&lt;keb&gt;截断&lt;/keb&gt;</upd_diff>
</audit>
<audit time="2014-01-27 17:17:45" stat="A">
<upd_uid>rene</upd_uid>
<upd_name>Rene Malenfant</upd_name>
<upd_email>...address hidden...</upd_email>
</audit>
<audit time="2014-01-27 17:18:33" stat="A" unap="true">
<upd_uid>rene</upd_uid>
<upd_name>Rene Malenfant</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_detl>not in any of my sources; i suspect it is [iK].  probably gets enough non-edict web hits to be worth recording though</upd_detl>
<upd_refs>merge</upd_refs>
<upd_diff>@@ -11,0 +12,4 @@
+&lt;/k_ele&gt;
+&lt;k_ele&gt;
+&lt;keb&gt;接断&lt;/keb&gt;
+&lt;ke_inf&gt;&amp;iK;&lt;/ke_inf&gt;</upd_diff>
</audit>
<audit time="2014-01-27 17:21:12" stat="A" unap="true">
<upd_uid>rene</upd_uid>
<upd_name>Rene Malenfant</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_refs>daij, shinmeikai, my ime</upd_refs>
<upd_diff>@@ -22,0 +23,4 @@
+&lt;r_ele&gt;
+&lt;reb&gt;さいだん&lt;/reb&gt;
+&lt;re_restr&gt;截断&lt;/re_restr&gt;
+&lt;/r_ele&gt;</upd_diff>
</audit>
<audit time="2014-01-27 21:58:43" stat="A">
<upd_uid>jwb</upd_uid>
<upd_name>Jim Breen</upd_name>
<upd_email>...address hidden...</upd_email>
</audit>
<audit time="2021-11-07 01:02:11" stat="A">
<upd_uid>Marcus</upd_uid>
<upd_name>Marcus Richert</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_diff>@@ -29,0 +30 @@
+&lt;pos&gt;&amp;vt;&lt;/pos&gt;</upd_diff>
</audit>
<audit time="2022-07-12 05:07:23" stat="A" unap="true">
<upd_name>Stephen Kraus</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_detl>HiddenForm 接断[iK]</upd_detl>
<upd_refs>Google N-gram Corpus Counts
╭─ーーーー─┬───────────┬───────╮
│ 切断　　 │ 1,573,771 │ 99.9% │
│ 截断　　 │     1,015 │  0.1% │ 🡠 daijr/s, etc.
│ 接断　　 │       425 │  0.0% │ 🡠 drop and hide (no refs)
│ せつだん │     2,167 │  N/A  │
│ さいだん │     2,955 │  N/A  │
╰─ーーーー─┴───────────┴───────╯</upd_refs>
<upd_diff>@@ -12,4 +12 @@
-&lt;/k_ele&gt;
-&lt;k_ele&gt;
-&lt;keb&gt;接断&lt;/keb&gt;
-&lt;ke_inf&gt;&amp;iK;&lt;/ke_inf&gt;
+&lt;ke_inf&gt;&amp;rK;&lt;/ke_inf&gt;</upd_diff>
</audit>
<audit time="2022-07-12 06:24:42" stat="A">
<upd_uid>jwb</upd_uid>
<upd_name>Jim Breen</upd_name>
<upd_email>...address hidden...</upd_email>
</audit>
<audit time="2022-08-14 11:15:35" stat="A">
<upd_uid>jwb</upd_uid>
<upd_name>Jim Breen</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_detl>Adding sK/sk forms.</upd_detl>
<upd_diff>@@ -12,0 +13,4 @@
+&lt;/k_ele&gt;
+&lt;k_ele&gt;
+&lt;keb&gt;接断&lt;/keb&gt;
+&lt;ke_inf&gt;&amp;sK;&lt;/ke_inf&gt;</upd_diff>
</audit>
<audit time="2022-08-14 13:07:15" stat="A" unap="true">
<upd_uid>robin1354</upd_uid>
<upd_name>Robin Scott</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_detl>截断 is only rK for せつだん. For さいだん, it's the only kanji form. I think さいだん should be a separate entry.</upd_detl>
</audit>
<audit time="2022-08-14 16:33:22" stat="A" unap="true">
<upd_name>Stephen Kraus</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_detl>The kokugos (daijr/s, shinmeikai) describe さいだん as 慣用読み and merely redirect to せつだん. Doesn't seem to have any distinct meaning from せつだん.

What is at stake by keeping さいだん in this entry? It's not clear to me how this could cause confusion.

By splitting the reading into a separate entry, I think we'd be making that information more difficult to discover. As it is right now, the info can be summarized into a single table:
https://gist.github.com/stephenmk/7da3f54216bfe5018706651e13efba2d</upd_detl>
</audit>
<audit time="2022-08-14 23:44:02" stat="A" unap="true">
<upd_uid>robin1354</upd_uid>
<upd_name>Robin Scott</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_detl>Indeed, there's no difference in meaning. But 截断 is not an rK form for さいだん, only せつだん. I propose keeping 截断 on this entry, tagged as rK, and splitting out さいだん into a separate entry where 截断 is not tagged as rK. This would also allow us to tag さいだん as "rare".</upd_detl>
</audit>
<audit time="2022-08-15 02:26:44" stat="A" unap="true">
<upd_name>Stephen Kraus</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_detl>I thought the [rK] tags signify that surface forms are rare for their respective entries rather than for particular readings. That's why we don't put [rK] tags on forms which have unique sense restrictions.

Outside of specific cases such as this one (in which the reading restriction is provided by the refs), I don't think we can adhere to that standard. For example, if in actuality 90% of the さいだん usages belonged to 截断 (rather than 100%), we probably wouldn't split 截断・さいだん into a separate entry. In general I don't know how we'd even go about measuring those sorts of usage ratios.

I don't think splitting the entry adds any new information that can't already be inferred from the existing entry. The conventional wisdom with databases is to avoid data duplication whenever possible.</upd_detl>
</audit>
<audit time="2022-08-15 05:56:42" stat="A" unap="true">
<upd_uid>jwb</upd_uid>
<upd_name>Jim Breen</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_detl>And just to confuse things, there is 裁断/さいだん/cutting (1296100 sense 1).  If we are moving さいだん from this entry, perhaps it should go in there?</upd_detl>
</audit>
<audit time="2022-08-15 22:38:17" stat="A" unap="true">
<upd_name>Stephen Kraus</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_detl>Sense [1] of 裁断 seems to have a specific cloth / paper nuance
which isn't mentioned in the せつだん entries. But since the サイ
reading of 截 seems to originate from 裁, I don't think 截断
would be out-of-place in the 裁断(さいだん) entry.</upd_detl>
<upd_refs>Daijs has this to say in its entry for 截(せつ):
「截」を「サイ」と読むのは「裁」などとの混同による。</upd_refs>
</audit>
<audit time="2022-08-15 23:04:39" stat="A" unap="true">
<upd_uid>robin1354</upd_uid>
<upd_name>Robin Scott</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_detl>All the kokugos make reference to cloth and paper for 裁断. I don't think a merge would be appropriate. It would also mean having to tag 截断 as an irregular kanji form.

&gt; I thought the [rK] tags signify that surface forms are rare for their respective entries rather than for particular readings.
That's correct. But this is precisely why I suggest splitting out さいだん. By tagging a surface form as rK, we're marking it as "rare" for all reading and senses that apply to that form. It's for this reason that we don't tag a form as rK if it's above the 3% threshold for just a particular sense.

&gt; I don't think splitting the entry adds any new information that can't already be inferred from the existing entry. 
Perhaps, but I don't think the entry in its current state is consistent with the definition of rK. We essentially have two words here (せつだん and さいだん), only one of which is rarely written as 截断. We split off はおうじゅ from サボテン for the same reason.

&gt; The conventional wisdom with databases is to avoid data duplication whenever possible.
For years, this was more or less JMdict's policy, with even the slightest overlap being used as justification for merging forms. But it resulted in some very messy and hard-to-read entries, and we've since undone a lot of these merges. We permit some degree of duplication for the sake of clarity and readability.</upd_detl>
</audit>
<audit time="2022-08-16 00:56:04" stat="A" unap="true">
<upd_name>Stephen Kraus</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_refs>仙人掌 is currently tagged as [rK] for the reading シャボテン on entry 1387180. シャボテン has two surface forms: シャボテン and 仙人掌. Can we say with confidence that 仙人掌 is truly [rK] for シャボテン? It probably is (since 仙人掌 can also be read as サボテン and せんにんしょう), but strictly speaking I don't think we have any way to measure these usages. This talk about surface forms being rare for particular readings makes me kind of uneasy. I think it's safer to think of [rK] as being strictly "for the entry" rather than also "for the readings."

I agree that duplicating / splitting entries is a good practice when it enhances clarity. Unlike the サボテン entry (which is quite a bit busier), I personally don't see the benefit for this 切断 entry; I think the separation would just make it more disorganized. If it were up to me, my preference would be to keep the entry as it currently is.</upd_refs>
</audit>
<audit time="2022-08-16 01:30:09" stat="A">
<upd_uid>jwb</upd_uid>
<upd_name>Jim Breen</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_detl>Hmmm. OK, maybe a split is best. I've created a "rare" 截断/さいだん entry. I'll close this and reopen briefly for any possible discussion.</upd_detl>
<upd_diff>@@ -9,4 +8,0 @@
-&lt;/k_ele&gt;
-&lt;k_ele&gt;
-&lt;keb&gt;截断&lt;/keb&gt;
-&lt;ke_inf&gt;&amp;rK;&lt;/ke_inf&gt;
@@ -23,4 +18,0 @@
-&lt;/r_ele&gt;
-&lt;r_ele&gt;
-&lt;reb&gt;さいだん&lt;/reb&gt;
-&lt;re_restr&gt;截断&lt;/re_restr&gt;</upd_diff>
</audit>
<audit time="2022-08-16 01:30:29" stat="A" unap="true">
<upd_uid>jwb</upd_uid>
<upd_name>Jim Breen</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_detl>Reopen.</upd_detl>
</audit>
<audit time="2022-08-16 01:37:20" stat="A" unap="true">
<upd_name>Stephen Kraus</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_detl>截断 is still a kanji form for せつだん.</upd_detl>
<upd_diff>@@ -8,0 +9,4 @@
+&lt;/k_ele&gt;
+&lt;k_ele&gt;
+&lt;keb&gt;截断&lt;/keb&gt;
+&lt;ke_inf&gt;&amp;rK;&lt;/ke_inf&gt;</upd_diff>
</audit>
<audit time="2022-08-16 01:45:54" stat="A" unap="true">
<upd_uid>robin1354</upd_uid>
<upd_name>Robin Scott</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_detl>I think we can say with confidence that 仙人掌 is [rK] for シャボテン. サボテン is written overwhelmingly in kana. There's no reason to believe that シャボテン would be any different.

&gt; I think it's safer to think of [rK] as being strictly "for the entry" rather than also "for the readings."
I don't think there's a meaningful distinction between "the entry" and the "the readings". Readings are words, and kanji are just a way of representing those words in writing. When an entry merges readings, it's really merging different words. To say that that a surface form is rK "for the entry" is to say that it's rK for the words the form corresponds to. In the case of 截断, it is not a rare surface form of the word さいだん.

I agree that splitting out さいだん doesn't do much in the way of readability. I just wanted to explain that we're not too concerned about duplication of information.</upd_detl>
</audit>
<audit time="2022-08-16 04:10:29" stat="A" unap="true">
<upd_name>Stephen Kraus</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_detl>It seems to me like a risky precedent to be duplicating entries over these tiny details, but maybe it won't be a big deal. At any rate, thanks for taking the time to hear out my concerns.</upd_detl>
</audit>
<audit time="2022-08-16 05:47:29" stat="A">
<upd_uid>jwb</upd_uid>
<upd_name>Jim Breen</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_detl>We/I used to be more aggressive about mergers but we've backed off a bit to avoid having notes about subtle differences, etc. It's a case-by-case judgement.</upd_detl>
</audit>
<audit time="2022-08-18 23:54:04" stat="A">
<upd_uid>robin1354</upd_uid>
<upd_name>Robin Scott</upd_name>
<upd_email>...address hidden...</upd_email>
<upd_detl>I do see where Stephen is coming from. But ultimately I think this approach is the safest and most "correct". There's no ambiguity. And users don't have to infer that さいだん is rare; we've made it explicit.
It won't always be obvious how often a kanji form is used for particular readings, but in clear-cut cases like this I think it makes sense to split.</upd_detl>
</audit>
</info>
</entry>
View entry in alternate formats: jel | edict | jmdict xml | jmnedict xml | jmdictdb xml
Search \| Advanced Search \| New Entry \| Submissions \| Help	wsgi/DB=jmdict

If you have questions about, problems using, or suggestions for improving this page, please send email to jimbreen@gmail.com.
JMdictDB is an open-source Postgresql database and Python API for managing Japanese dictionary data developed by Stuart McGraw. More information at http://edrdg.org/~smg/. Please report software problems at https://gitlab.com/yamagoya/jmdictdb/issues or email to jmdictdb@mtneva.com.
JMdictDB - Japanese Dictionary Database

Entries