This document collects some notes on the care and feeding of a JMdictDB instance and on tools and techniques useful for troubleshooting problems.

To install a new JMdictDB instance (code and database) see the Installation Guide.

For information about doing development work on JMdictDB see the Development Guide.

For information on changing the set of tags that can be applied to JMdictDB entries see the Tags Guide.

A project description page is at: https://www.edrdg.org/~smg/.
The JMdictDB source code is available at: https://gitlab.com/yamagoya/jmdictdb/.
Discussion takes place on the Google Groups Edict-JMdict mailing list (edict-jmdict+subscribe@googlegroups.com | https://groups.google.com/g/edict-jmdict/about | https://www.edrdg.org/jmdict_edict_list).

1. Document conventions and placeholders

In the sections that follow the following placeholders should be replaced with actual values:

{{DEVDIR}}

The local development directory where the JMdictDB code has been checked out to from Git. Example value: ~/devel/jmdictdb/

{{URLROOT}}

URL root that the Apache (or other WSGI-capable) web server is configured to serve the JMdictDB pages under. Example value: /jmdictdb (the web server will serve the JMdictDB pages under the URLs https://localhost/jmdictdb/, e.g., https://localhost/jmdictdb/srchform.py)

{{WEBROOT}}

The location the web component files are installed at and where the web server is configured to look for them at. Example value: /usr/local/jmdictdb/. The default value is /var/www/jmdictdb/.

Example URLs show both "http:" and "https:" URL schemes; accessing the development Flask server will generally be via "http:" but the choice of "http:" or "https:" for the production web server will depend on the site configuration.

2. User management

When JMdictDB is accessed by the web pages, there are three levels of user privilege:

  • Anonymous — Any user has general read access to the JMdictDB web pages (e.g., search for entries) and can submit new or edited entries as unapproved submissions without any need to login.

  • Editor — To approve entries a user must log in as an Editor.

  • Admin  — A user logged in as Admin has, in addition to Editor privileges, the ability to manage other users (add, remove, modify, etc.)

There are two way to manage users:

  • The users.py web page. This is accessible by logging in and then clicking your name (which is a link) beside the Logout" button. This will take you to a page that lets you change the settings for your own account. If you have Admin privilege, there will also be two links: "Add User" and "List Users" that will allow to to perform user admin actions on other users. If you lose access to this page (forgotten password, etc) you can restore access with the bin/users.py program.

  • The bin/users.py command line program. To use this you must have access to a Postgresql database account with access to the JMdictDB database. Run bin/users.py --help for details on use. The installation procedure uses this script to add an initial Admin user.

3. Temporary downtime

Web access to the JMdictDB pages may be temporarily disabled for maintenance, because of excessive load, or to block access from a set of specific IP addresses. When this is done visitors (or specific visitors in the case of an IP block) to any JMdictDB pages will see a message to that effect.

This is done by creating a control file in a directory designated as the status directory. The location of the status directory is set in the configuration file by the STATUS_DIR setting. By default it is the {{WEBROOT}}/lib/ directory.

The control files are named "status_maint", "status_load" or "status_blocked". If either of the first two exist (contents are ignored), any web access to a JMdictDB web page will result in a redirect to the pages "status_maint.html" or "status_load.html" which present the user with a message that the system is unavailable due to maintenance or excessive load, respectively.

If the "status_blocked" control file exists, it should contain IP addresses, one per line. When a visitor’s IP matches one of the entries, a redirect to the page "status_blocked.html" will be returned. Lines that do not have the format of an IP address are ignored (as is any text after the first word on a line) and may be used for comments.

The names of the control files and the location of the html files are not customizable although you can modify the contents of the html files.

It is up to you to create and and remove the control files as appropriate.

4. Upgrading JMdictDB

Development of JMdictDB is ongoing and updates to the code are made frequently and to the database schema, occasionally. Updates to the code are cummulative: regardless of your current version, you can upgrade the code to the latest version without any intermediate steps. Updates to the database contents are not: all updates between the current database version and the latest one need to be applied to each JMdictDB database.

In general the upgrade procedure is to pull the current version of the code (containg the latest changes) from the GitLab repository, install them with the 'make' command, then apply any required database updates. In some cases additional or alternate steps will be required which will be detailed in the update instructions.

If both a software and database upgrade are required, the JMdictDB web service may not be available from the time the software is updated until the database update completes successfully. During this period users may get a "database version error" page because the software and database will temporarily be at incompatible versions.

4.1. Code upgrades

Upgrades should be done from a clean Git checkout; the installer will install all the files it finds in the source directories so installing from an active development clone where they may be extraneous working files is undesireable. The install directory (denoted by {{INSTDIR}} below) is temporary and may be deleted after the upgrade is completed. Its location is unimportant.

Alternately you can use an existing development repository if after a 'git pull' to update the code, 'checkout edrdg' (or master), 'git status' reports no untracked files in the jmdictdb/ subdirectory.

To do the upgrade:

Get a fresh clone of the JMdictDB code and checkout the desired branch.

$ rm -rf {{INSTDIR}}
$ git clone https://gitlab.com/yamagoya/jmdictdb.git {{INSTDIR}}
$ cd {{INSTDIR}}
$ git checkout edrdg    # or master.

If you want to evaluate the upgraded code, you can so so at this point by starting the Flask local server as described in section 4. The Flask debug server of the Development Guide. However, if a database upgrade is also required you will also need to make a copy of the production database (to database "jmdev" for example), apply the database update(s) to it, then use an appropriate URL to access the Flask server with the upgraded database (eg, http://localhost:5000/srchform.py?svc=jmdev).

To install the upgraded code system-wide do the following. The commands must be run as a root user, perhaps using 'sudo'.

If you have not previously done so, run:

# git config --global --add safe.directory {{INSTDIR}}

The above command is needed to override security protections added to Git in April 2022. For more details see: https://github.blog/2022-04-12-git-security-vulnerability-announced/. It updates the git configuration file and thus need only be run once and remains in effect for subsequent upgrades. [1]

Install the upgraded code with:

# cd {{INSTDIR}}
# make WEBROOT={{WEBROOT}} install-sys

If you are using a WSGI server don’t forget to reload the WSGI application per section 4.3, “Reload the WSGI application” below, even if you have no database updates.

The install directory can now be deleted.

# rm -rf {{INSTDIR}}

4.2. Database upgrades

In addition to updating the JMdictDB software as described above, sometimes upgrading the database schema is necessary to support new features.

Upgrading the database is done by executing one or more SQL script files with the Postgresql tool, psql, or a script that runs psql such as db/updates/update.sh. In addition to the actual schema changes made by the script, it also stores a database version (aka update) number, usually shown as a 6-digit hexadecimal number, in the database. [2]

When the JMdictDB software opens a connection to a JMdictDB database, it checks the database version number and will exit if the number does not match the number it expects (which is stored in the file jmdictdb/dbver.py). This is to reduce the chances of the code trying to access a database schema it does not fully understand.

4.2.1. Backup the current jmdict database

$ pg_dump -Fc jmdict > <FILENAME>

where <FILENAME> is the name to use for the backup file. It can be named and located whatever and anywhere you want.

4.2.2. Determine the correct database updates to apply

This is usually not necessary since the update documentation will usually provide this information.

The full set of historical database updates are maintained in the db/updates/ directory. The update files are named using the convention:

nnn-xxxxxx.sql

where "nnn" is a 3-digit decimal number and "xxxxxx" is a 6-digit hexadecimal number. The former are usually sequential (but there may be gaps sometimes) and indicate the order in which the updates should be applied. The latter have randomly chosen values, actually identify a specific update, and are what are referenced in the documentation.

Generally the update documentation will indicate which of the update files need to be applied. If not, run the following command to show the current database version and whether or not it is compatible with the current JMdictDB code.

$ tools/dbversion.py jmdict

If the api and database version are compatible, you’re all set. If not it will report something like:

code expects updates: d30cfd
jmdict: incompatible, missing updates: d30cfd, has updates: e4aa1c

Then, look in db/updates/ for a series of updates that will bring the database from, in this example, e4aa1c to d30cfd. At the time of writing, there are two that follow 036-e4aa1c.sql:

036-e4aa1c.sql  037-46354d.sql  038-d30cfd.sql

4.2.3. Apply the database updates

Run the db/updates/update.sh script to apply all the required updates. Assuming the updates:

037-46354d.sql
038-d30cfd.sql

are required, as determined from the update documentation or by means of the dbversion.py tool described above, they are conveniently applied by the update.sh script[3]:

cd db/updates/
./update.sh jmdict 037-46354d.sql 038-d30cfd.sql

The first argument is the database to update; the remaining arguments are the update files to apply.

If Postgresql is configured to do peer authentication for local users, you may get an error like:

sql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL:  Peer authentication failed for user "jmdictdb"

In this case use either of the following two forms:

PGHOST=localhost ./update.sh jmdict 037-46354d.sql 038-d30cfd.sql
./update.sh postgres://jmdictdb@localhost/jmdict 037-46354d.sql 038-d30cfd.sql

The update scripts are generally written to work as a single transaction: if there is a failure, all changes made by the script will be undone and, after the problem is resolved, the script can rerun.

4.3. Reload the WSGI application

If you are serving the JMdictDB application via WSGI, you will probably need to tell the WSGI server to reload the updated application. For Apache with mod_wsgi you can do this using the 'touch' command applied to the .wsgi file created during installation (see section 6.2.4. Create a .wsgi file of the Installation Guide.)

4.4. If something goes wrong

If no database update was involved, a software update can be reversed by checking out the Git revision that was in use prior to the upgrade and reinstalling it with the 'sudo make …​ install-sys' command used in section 4.1, “Code upgrades”.

The Makefile does not use the traditional "file modification time" to decide whether to reinstall the target files; rather it runs an install script that will reinstall a target file if it is different (determined by checksum) than the source file. Thus the earlier versions of the source files should get properly reinstalled to their destination locations.

If the upgrade involved a database update, reversing it is more complex. If you have a backup if the database from before the upgrade and no activity (new submissions, etc.) have occurred since the upgrade, restoring from the backup is probably the best option. If that can’t be done then you will need to examine the database update file(s) and manually undo the changes they made, including removing the new database version number and activating (setting the "active" column value to True) for the previous database version number).

5. General troubleshooting

5.1. cgiinfo.py web page

If the JMdictDB web server is more-or-less operational, the cgiinfo.py web page can provide useful information on the server environment. Despite its name, it is not limited to the (now defunct) CGI backend and runs under the WSGI server as well.

Of particular interest is often the "pkg location" and "pkg_version" in the Execution Info section; the server code using the wrong or outdated version of the software is a common cause of unexpected behavior.

The cgiinfo.py page can also show if the correct .ini files are being used, the location of the log file and the available database service names and databases.

5.2. Log files

There are several sources that may provide diagnostic information in the case of problems:

  • web server log files (OS dependent location)

  • postgresql log files (OS dependent location)

  • JMdictDB log files (typically in {{WEBROOT}}/lib/jmdictdb.log but location is defined in the config file and is shown in the cgiinfo.py web page.)

Note that the JMdictDB log file must be pre-created; the JMdictDB code will not create it automatically (see the xref:install.adoc#prod-logf). If it is not not accessible or writable by the web server at web server startup, an error message to that effect will be written to the web server’s error log.

Also note that it is not truncated or rotated periodically; you must arrange for that.

5.3. Command line programs

tools/dbversions.py

Scans acessible Postgresql databases and for those that are JMdictDB databases, reports the database version and whether or not it is compatible with the JMdictDB software. (Similar information is available from the cgiinfo.py web page if the web server is in a usable state.)

bin/shentr.py

Entries in the database can be examined directly, with no involvement from the web server, by the command line program, bin/shentrpy. Run the program with the --help option for full details.

Appendix A: Operational tools

These are tools for use at installed JMdictDB sites and are located in bin/.

bulkupd.py

Allows making similar changes to a large number of database entries at once.

conj.py
dbcheck.py
dbreaper.py
entrs2xml.py

Produce a JMdict or JMnedict XML file from the database contents. This does the inverse of jmparse.py.

ex2txt.py

Produce a Tatoeba sentence examples file from data in a JMdictDB database. This does the inverse of exparse.py.

exparse.py

Parse a Tatoeba sentence examples file into a form suitable for loading into a JMdictDB database by pgload. Normally run by Makefile-db. This does the inverse of ex2txt.py.

jelload.py
jmdbss.txt
jmparse.py

Parse a JMdict or JMnedict XML file into a form suitable for loading into a JMdictDB database by pgload. Normally run by Makefile-db. This does the inverse of entr2xml.py.

kdparse.py

Parse a kanjdic XML file into a form suitable for loading into a JMdictDB database by pgload. Normally run by Makefile-db. These is not currently a program to produce a kanjidic XML file from database data.

mklabels.py
pgload.py

Loads the intermediate file produced ny jmparse.py, kdparse.py or exparse.py into a JMdictDB database. Normally run by Makefile-db.

shentr.py

A command line program that can display entries from a JMdictDB database.

users.py

A program for managing (adding, deleting modifying) users in the jmsess database.

xresolv.py

This is run after pgload.py loads a corpus into a JMdictDB database and attempts to resolve (produce xrefs from) the unresolved xrefs generated by pgload.py. Normally run by Makefile-db.


1. If you wish, you can undo the Git configuration change after the install is done with: # git config --global --unset safe.directory {{INSTDIR})
2. The database version number is stored as an integer in table "db" but generally referred to as a hexadecimal string available in the view "dbx".
3. The update.sh script simply runs the command `psql -Ujmdictdb -d<first-arg> -f<next-arg>`, for each of the second and subsequent arguments (denoted <next-arg>).