[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [edict-jmdict] Query slowdown after loading Kanjidic data with make loadkd





On Thu, May 4, 2017 at 2:02 PM, s mcgraw smcg6347@outlook.com [edict-jmdict] <edict-jmdict@yahoogroups.com> wrote:
 

Hello Gabriel, 


Hello Stuart,

Thank you for your quick reply. Also, thanks for writing all of this code and making it public and free and open source software. 


The problem I think you are having is that the 'make loadkd' drops all the
database indexes in order to be able to load the data at a reasonable speed.
To restore them do a 'make postload'. Without indexes, I'm surprised the
slowdown wasn't even bigger. 


Ah I see, that makes sense. Thank you for the tip. 


I also noticed that due to changes in kanjidic2 file over the last 3 years
(the last time I updated the kanjidic parser), I couldn't parse the current
version so I updated the parser. 


Yes, I had the same problem. I was getting a problem with the 'vietnam' r_type during import in the kdparse.py file. I simply did a hack around it, but having proper import would be great. 

 At the moment I am having a problem pushing 

the update to the repository so I can email the updated file to you if you
need it. 


That would great! I am looking forward to the update. 


Also, I imagine you are super busy, but if you have the opportunity could you please give me a pointer on how to efficiently look for examples on the database? I am currently using:


        SELECT DISTINCT
   e.id,
   kanj.txt,
   gloss.txt
        FROM
   entr AS e
   JOIN kanj ON kanj.entr=e.id
   JOIN gloss ON gloss.entr=kanj.entr
        WHERE kanj.txt LIKE '%{}%' AND e.src=''
        LIMIT 1;

but this is a tad slow. I suspect it is slow because I am using LIKE with the double wildcards, but I haven't figure out any other way. 

Best regards,
Gabriel


-- Stuart

On 05/03/2017 07:06 PM, 'Gabriel J.' Pérez Irizarry gabrieljoel@gmail.com [edict-jmdict] wrote:
> Hello!
>
> Many thanks for creating such a wonderful resource. What I am making would be impossible without it.
>
> I do 'make loadall' to get all the data and then run a few queries. The slowest query takes about ~250ms:
>
> SELECT DISTINCT
> e.id,
> kanj.txt,
> gloss.txt
> FROM
> entr AS e
> JOIN kanj ON kanj.entr=e.id
> JOIN gloss ON gloss.entr=kanj.entr
> WHERE kanj.txt LIKE '%食べる%' AND e.src=''
> LIMIT 1;
>
> (I am doing this to look for examples)
>
> If after doing 'make loadall' I do 'make loadkd' to add the Kanjidic data, then this same query starts taking double the time to run.
>
> On the documentation:
>
> * WARNING: kanjidic2 support is usable but incomplete.'
>
> Could the slowdown be because the support is incomplete or am I making an error somewhere? Any thoughts are appreciated.
>
> Best regards,
> Gabriel
>