12. |
A 2024-03-26 17:53:18 Stephen Kraus <...address hidden...>
|
11. |
A* 2024-03-26 03:16:58 Jim Breen <...address hidden...>
|
|
Refs: |
https://github.com/JMdictProject/JMdictIssues/issues/122 |
|
Comments: |
Setting this up as an example of the approach developed in issue 122. |
|
Diff: |
@@ -5 +5 @@
-<keb>前門の虎後門の狼</keb>
+<keb>前門の虎、後門の狼</keb>
@@ -8 +8,2 @@
-<keb>前門の虎、後門の狼</keb>
+<keb>前門の虎後門の狼</keb>
+<ke_inf>&sK;</ke_inf>
@@ -11 +12 @@
-<reb>ぜんもんのとらこうもんのおおかみ</reb>
+<reb>ぜんもんのとら、こうもんのおおかみ</reb> |
10. |
A 2022-08-24 04:04:51 Jim Breen <...address hidden...>
|
|
Comments: |
Apologies - I didn't look hard enough at Stephen's proposal. I agree we shouldn't be using such fragments as "rK" forms, at least at this stage. |
9. |
A* 2022-08-24 00:45:26 Robin Scott <...address hidden...>
|
|
Comments: |
Whether or not we want to use search-only forms in this way, I don't think it's safe to do so as long as they're included with the other forms. Many sites/apps don't support the new tags yet. Some probably never will.
For the next generation of JMDict, I'd like to see search-only forms handled separately (i.e. in a different part of the XML). Then we can at least consider ideas like this.
And I agree with Marcus regarding the n-grams. The fact that both 前門の虎 and 後門の狼 have nearly identical counts suggests that they're almost always used together. Online search results appear to confirm this. |
|
Diff: |
@@ -9,16 +8,0 @@
-</k_ele>
-<k_ele>
-<keb>前門の虎</keb>
-<ke_inf>&sK;</ke_inf>
-</k_ele>
-<k_ele>
-<keb>後門の狼</keb>
-<ke_inf>&sK;</ke_inf>
-</k_ele>
-<k_ele>
-<keb>前門のとら</keb>
-<ke_inf>&sK;</ke_inf>
-</k_ele>
-<k_ele>
-<keb>後門のおおかみ</keb>
-<ke_inf>&sK;</ke_inf> |
8. |
A* 2022-08-23 13:36:14 Marcus Richert <...address hidden...>
|
|
Comments: |
I'm not sure we should be using sK for things like this? Feels like we could be opening up a can of worms here. The reason the individual parts get more hits in the ngrams is frequently a problem with the searchability of the ngrams rather than real evidence the individual parts are frequent on their own. |
(show/hide 7 older log entries)
|
7. |
A 2022-08-23 06:33:42 Jim Breen <...address hidden...>
|
|
Comments: |
Indeed. |
6. |
A* 2022-08-22 19:41:36 Stephen Kraus <...address hidden...>
|
|
Refs: |
Google N-gram Corpus Counts
╭─ーーーーーーーー─┬───────┬───────╮
│ 前門の虎後門の狼 │ 0 │ 0.0% │
│ 前門の虎 │ 5,043 │ 47.6% │ 🡠 adding
│ 後門の狼 │ 4,499 │ 42.5% │ 🡠 adding
│ 前門のとら │ 522 │ 4.9% │ 🡠 adding
│ 後門のおおかみ │ 528 │ 5.0% │ 🡠 adding
╰─ーーーーーーーー─┴───────┴───────╯ |
|
Comments: |
Adding these sorts of hidden forms might make these long idiomatic entries easier to find |
|
Diff: |
@@ -8,0 +9,16 @@
+</k_ele>
+<k_ele>
+<keb>前門の虎</keb>
+<ke_inf>&sK;</ke_inf>
+</k_ele>
+<k_ele>
+<keb>後門の狼</keb>
+<ke_inf>&sK;</ke_inf>
+</k_ele>
+<k_ele>
+<keb>前門のとら</keb>
+<ke_inf>&sK;</ke_inf>
+</k_ele>
+<k_ele>
+<keb>後門のおおかみ</keb>
+<ke_inf>&sK;</ke_inf> |
5. |
A 2022-04-06 09:14:33 Marcus Richert <...address hidden...>
|
|
Diff: |
@@ -5,0 +6,3 @@
+</k_ele>
+<k_ele>
+<keb>前門の虎、後門の狼</keb> |
4. |
A 2022-04-06 04:27:48 Jim Breen <...address hidden...>
|
3. |
A* 2022-04-06 02:40:33 Opencooper
|
|
Diff: |
@@ -14,0 +15 @@
+<gloss g_type="lit">a tiger at the front gate, a wolf at the back gate</gloss> |
2. |
A 2012-05-01 02:12:03 Rene Malenfant <...address hidden...>
|
1. |
A* 2012-04-30 21:22:07 Jim Breen
|
|
Refs: |
GG5, etc. |
|
Comments: |
66k hits vs 2k for 2417290 (後門の虎前門の狼) |