This document provides information useful when developing the JMdictDB code and database, to add new features or fix bugs.

To install a new JMdictDB instance (code and database) see the Installation Guide.

For information about the operation and maintenance of a JMdictDB instance, including performing code and database upgrades, see the Operations Guide.

For information on changing the set of tags that can be applied to JMdictDB entries see the Tags Guide.

A project description page is at: https://www.edrdg.org/~smg/.
The JMdictDB source code is available at: https://gitlab.com/yamagoya/jmdictdb/.
Discussion takes place on the Google Groups Edict-JMdict mailing list (edict-jmdict+subscribe@googlegroups.com | https://groups.google.com/g/edict-jmdict/about | https://www.edrdg.org/jmdict_edict_list).

1. Document conventions and placeholders

In the sections that follow the following placeholders should be replaced with actual values:

{{DEVDIR}}

The local development directory where the JMdictDB code has been checked out to from Git. Example value: ~/devel/jmdictdb/

{{URLROOT}}

URL root that the Apache (or other WSGI-capable) web server is configured to serve the JMdictDB pages under. Example value: /jmdictdb (the web server will serve the JMdictDB pages under the URLs https://localhost/jmdictdb/, e.g., https://localhost/jmdictdb/srchform.py)

{{WEBROOT}}

The location the web component files are installed at and where the web server is configured to look for them at. Example value: /usr/local/jmdictdb/. The default value is /var/www/jmdictdb/.

Example URLs show both "http:" and "https:" URL schemes; accessing the development Flask server will generally be via "http:" but the choice of "http:" or "https:" for the production web server will depend on the site configuration.

2. Setting up a development environment

The JMdictDB code, like most projects, is maintained in a Git repository. Code may me checked to to multiple directories with multiple development branches in each. This guide assumes familiarity with Git.

Given a Git checkout to directory {{DEVDIR}} (you may have several) to most convenient way of observing the web UI is to run the Flask development server and view the JMdictDB web pages with a web browser.

The command line programs are debugged in a terminal window as any Python program would be: inserting print statements, using a Python debugger etc.

For prelease testing or previews it is useful to serve the pages via Apache using prerelease staging environment. This will replicate the final production environment as closely as possible and allow controlled access from outside the local site.

2.1. Get the code

You can clone the current version of the JMdictDB code from the GitLab "master" branch.

If you have installed a production environment as described in the Installation Guide you already cloned the code to {{DEVDIR}} during that procedure and can use that directory for development. If you wish to maintain a separate directory for development, repeat the procedure with an alternate {{DEVDIR}}:

$ git clone https://gitlab.com/yamagoya/jmdictdb.git {{DEVDIR}}
$ cd {{DEVDIR}}
$ tools/upd-version.py
$ tools/githooks.sh
post-checkout and post-commit hooks installed

See section 5.1 Get the code in the Installation Guide for details about the purpose of the two tools/* commands.

2.2. Development database

If you are installing JMdictB on a dedicated machine for development then you can use the default "jmdict" database for development since no one else will care about it’s contents; a separate dedicated development database is not necessary.

A "jmdict" database is created during a normal JMdictDB install as described in the Installation Guide.

Otherwise…​

2.2.1. Use a dedicated database for development

If JMdictDB was installed for production use — that is, there are users who expect the data in the default "jmdict" database to be consistent and preserved — but you want to do development work on the code using the same Postgresql server, you will want to create a dedicated development database to prevent changing the contents of the live, production "jmdict" database.

You can use the production "jmdict" database while doing some development work if you are careful and keep in mind that changes you to make to entries while running the development code will affect the live, user-visible data. Create new entries in the "test" corpus for example. (But note even this will have side effects like increasing the entry id and sequence numbers needlessly and producing "noise" lines in the JMdictDB log files.) Creating a dedicated development database is a better option.

2.2.2. Setting up a development database

The development database should be treated as transient; delete and recreate as often as needed. It’s also sometimes useful to have multiple development databases, just remember their names have to be registered in the JMdictDB config-pvt.ini file in order to access them via the web UI. The command line tools all accepts a -d/--database or similar option to specify the database to access.

Below we will assume that the development database is named "jmdev"; you can name it whatever you wish as long as you:

  • change the name correspondingly in the steps below.

  • start the name with "jm" (some code that scans for JMdictDB databases assume this convention.)

  • note that the database names, "jmnew", "jmold", "jmtest01" are used by other parts of JMdictDB.

If you are using the "jmdict" database for development, use that name below rather than "jmdev".

If you haven’t already, follow the steps in the Installation Guide section 5.2: Configure the Postgresql database server to create and load a production database named "jmdict". This will accomplish other requirements such as creating the session database. You only need to do this one time on a given machine.

You can now create a development database by several means:

  • If you have a production "jmdict" database (for example after completing 5.3 Load the database of the Installation Guide) you can make a copy of it:

    $ createdb -O jmdictdb jmdev
    $ pg_dump jmdict | psql -d jmdev
  • Use the install makefile (Makefile-db) to create a new JMdictDB database, then load it from XML files. Follow the steps in the Installation Guide 5.3 Load the database. The new database will be named "jmnew" and can be renamed to "jmdev". For example:

    $ make -f Makefile-db jmnew
    $ make -f Makefile-db loadjm
    $ make -f Makefile-db loadex
    $ make -f Makefile-db postload
    $ psql -Upostgres -c 'ALTER DATABASE jmnew RENAME TO jmdev'
  • You can also accomplish the above results by creating a new "jmdev" database and running the XML import tools to load it directly. Look at how Makefile-db does it and the tools' --help options for guidance.

  • Create a new, empty JMdictDB database with no entries, rename it to "jmdev", then use the JMdict web pages UI to add entries:

    $ make -f Makefile-db jmnew
    $ psql -Upostgres -c 'ALTER DATABASE jmnew RENAME TO jmdev'
  • If you have a database dump of a JMdictDB database created with pg_dump, you can load it directly. Note that the details of the pg_restore command will depend on how the dump file was created; see the Postgresql "pg_restore" documentation of more information.

    $ createdb -O jmdictdb jmdev
    $ pg_restore -d jmdev <filename>

2.3. Configuration file relationships

There are five configuration files note1 that are not in in the git repository and that need to be created when installing a JMdictDB instance:

  • Apache configuration directives

  • A .wsgi file, commonly named jmdictdb.wsgi

  • The main configuration file, commonly named jmdictdb.ini

  • A private config file, commonly named jmdictdb-pvt.ini

  • A log file that will receive log messages, jmdictdb.log

Full details about what goes in them is provided in the next section ( 2.4, “Configuration files”) and section 2.6, “Testing/staging server”; this section is a quick overview, focusing on how they relate to each other.

note1: There are actually only three "configuration files": the Apache directives might be part of an existing Apache configuration file, and the log file isn’t really a configuration file but does need to be created. There are also additional files that may need to be created for certain features such as a test user password file for running view tests. These are documented in the relevant documentation sections.

2.3.1. Apache configuration

These are Apache directives that control the operation of the mod_wsgi component and will go in a location determined by the Apache web server. The "WSGIScriptAlias" directive tells Apache’s mod_wsgi component where to find the .wsgi file that is used to load the JMdictDB application code:

WSGIScriptAlias [...] [...]/lib/jmdemo.wsgi

This file is only relevant when the JMdictDB app is being served via the Apache web server; it is not relevant when running the Flask server.

2.3.2. .wsgi file

This is an importable Python file that is run by the Apache mod_wsgi component. It is commonly located (in a development environment) in the web/lib/ directory. It identifies the main configuration file that the JMdictDB Flask application will read when it starts, jmdictdb.ini below.

cfgfile = [...], '../lib/jmdictdb.ini'))

This file is only relevant when the JMdictDB app is being served via the Apache web server; it is not relevant when running the Flask server.

2.3.3. Main and private configuration files, log file

When the JMdictDB app is run via the Apache web server it reads the main configuration file given in the .wsgi file (jmdictdb.ini in example above). When run via the Flask development server, the config file to use is given on the command line used to run the server.

The main configuration file contains references to two other files:

[config] PRIVATE: jmdictdb-pvt.ini

Defines which databases are available through the web UI and provides access credentials for them.

[logging] LOG_FILENAME: jmdictdb.log

Location and name of file that JMdictDB app logging messages will be written to.

2.4. Configuration files

The installed production version of the JMdictDB code will live in a system-accessible location as will the configuration files it uses.

When running the JMdictDB code during development you will be running it directly from {{DEVDIR}} and the code will access the config files in {{DEVDIR}}/web/lib/.

2.4.1. Private configuration file (jmdictdb-pvt.ini)

This file contains credentials that the JMdictDB code will use to access the database. If there is an existing jmdictdb-pvt.ini file for access to the main "jmdict" database, you can give this file a different name (e.g., debug-pvt.ini) and set that name in the [config]/PRIVATE option in the main confuration file below.

Change to directory {{DEVDIR}}/web/lib/, copy jmdictdb-pvt.ini-sample to jmdictdb-pvt.ini and edit it:

  • In the [flask] section, in the line, key = xxxxxxxxxxxxxxxx, replace the string of x’s with a passphrase or better, a string of random characters (preferably 16 characters or longer).

  • In the [DEFAULT] section replace the pw and sel_pw values ('yyyyyy' and 'zzzzzz') with the passwords for the jmdictdb (yyyyyy) and jmdictdbv (zzzzzz) users you created when following the steps in the Installation Guide section 5.2.1. Get access to the Postgresql server

If using a dedicated development database, uncomment the lines in the "[db_jmdev]" section. Change "jmdev" here and in the "dbname" name if the development database name if different.

Uncomment the [db_jmtest01] section and change the usernames and passwords as described above. Web access to the jmtest01 test database in needed by the web page tests in described in section 3.2, “Web page tests”.

If you want to prevent access by the web UI development code to the production database, remove the [db_jmdict] section entirely. Note though that this will not prevent access via the command line tools.

Make sure that jmdictdb-pvt.ini is not world-readable but is readable by the web server process, for example:

$ chgrp www-data jmdictdb-pvt.ini
$ chmod 640 jmdictdb-pvt.ini

(You may need a user name other than "www-data" on some systems; often "httpd" on Redhat derived system for example.)

You can name the file whatever you wish, naming it jmdictdb-pvt.ini is not a requirement. Change the the value of the [config]/PRIVATE option in the main configuration file (debug.ini in example below) to match. Multiple *.ini file can share the same *-pvt.ini file.

2.4.2. Main configuation file (debug.ini)

When running the Flask development server it is convenient to use a separate config file tailored to debugging with logging messages directed to the terminal rather than a log file and possibly different logging parameters. To avoid conflict with any existing jmdictdb.ini file, the config file for use with Flask can be named debug.ini.

Copy jmdictdb-sample.ini to debug.ini and edit it.

Change the [config]/PRIVATE option value if you are using a private config file named other than "jmdictdb-pvt.ini".

Change the default database service to that for the development database (if not using the jmdict database for development of a non-production machine.) E.g., change:

#DEFAULT_SVC = db_jmdict

to

DEFAULT_SVC = db_jmdev

Comment out the "LOG_FILENAME = …​" line. When commented out, log output will go to the terminal window Flask is being run in.

You may also want to change the logging level or enable certain logging messages in the [logging]/LOG_FILTERS section.

2.5. The Flask development server

The Flask web framework comes with a builtin development server which is the primary tool for debugging code during development. At this point you can:

$ cd {{DEVDIR}}/
$ tools/run-flask.py -d web/lib/debug.ini

(Creating a debug.ini file was suggested in section 2.4.2, “Main configuation file (debug.ini)”. Alternative you can specify the name of the regular jmdictdb.ini file but that may send log messages to log file.)

The run-flask.py command should respond with something like:

Using cfgfile: .../jmdictdb/web/lib/debug.ini
 * Serving Flask app 'jmdictdb.flaskapp' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: on
I werkzeug:  * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
I werkzeug:  * Restarting with stat
Using cfgfile: .../jmdictdb/web/lib/debug.ini
W werkzeug:  * Debugger is active!
I werkzeug:  * Debugger PIN: 590-602-843

You can now start a web browser and go to:

http://localhost:5000/

If all is well you will see the JMdictDB Advanced Search page.

You can append "?svc=jmdict" or similar to the URL to access any database service defined in the jmdictdb-pvt file (or whatever file is given in the main .ini file’s [config]/PRIVATE option).

See the section 4, “The Flask development server” below for more information.

2.6. Testing/staging server

While the Flask development server will be the most convenient server to use during development, you may want to serve the development code via a production web server to:

  • Verify there are no differences in results between the development server’s WSGI implementation and the production server’s (or debug those differences if there are.)

  • Provide beta or preview access to a new version of the software to a larger audience than Flask can serve.

  • Provide such access when some of the audience is off- site; the Flask server is not recommended for general access for security as well as performance reasons.

This section will only describe configuring an Apache web server but the principles will be the same for any WSGI-capable server.

2.6.1. Get the code

You can serve the application code from a newly cloned JMdictDB directory or from an existing {{DEVDIR}} directory. We will refer to either as {{TESTDIR}} below.

2.6.2. Main and private configuration files

Create a main configuration .ini file based on the sample file:

$ cd {{TESTDIR}}/web/lib/
$ cp jmdictdb.ini-sample jmdictdb.ini

You can use some other name, it need not be jmdictdb.ini; its name will will be given in the .wsgi file ( 2.6.4, “Wsgi file”) or via an Apache environment variable.

Edit it and make similar changes as described above in 2.4.2, “Main configuation file (debug.ini)”

You can use the same jmdictdb-pvt.ini (or whatever name you used) file ( 2.4.1, “Private configuration file (jmdictdb-pvt.ini)”) to give the web server access to the same set of databases, or create an additional *-pvt.ini file with different databases and change the value of [config]/PRIVATE in the jmdictdb.ini file to point to it.

but point LOG_FILENAME to a real file (if left commented out messages will go to stderr which the web server may write to its own logs or discard.)

Be sure to create the log file as JMdictDB app will not create it automatically. It must be writable by the web server.

2.6.3. Apache configuration directives

For more details on Apache configuration, see section 6.3. Configure the Apache webserver in the Installation Guide.

In this example the test/preview app will be made accessible at the URL https://jmdemo/. You can of course change "jmdemo" to anything you wish.

{{TESTDIR}} denotes the directory into which you cloned the JMdictDB code for the running the test/preview server.

WSGIDaemonProcess jmdemo processes=1 threads=2 \
    display-name=apache2-jmdemo locale=en_US.UTF-8 lang=en_US.UTF-8
WSGIScriptAlias /jmdemo {{TESTDIR}}/lib/jmdemo.wsgi \
    process-group=jmdemo
  # Serve static files directly without using the app.
Alias /jmdemo/web/ {{TESTDIR}}/web/
<Directory {{TESTDIR}}/web/>
    DirectoryIndex disabled
    Require all granted
    </Directory>

Line 3 points to the .wsgi file which this example is placed in {{TESTDIR}}/lib/jmdemo.wsgi; you can use a different name or place it elsewhere and adjust that line to match if you wish.

2.6.4. Wsgi file

The .wsgi file is very similar the one used for a production instance and described in section 6.2.4. Create a .wsgi file of the Installation Guide except:

  • The local development directory’s path is inserted into sys.path so that the local jmdictdb package will be imported rath then the system-installed one.

  • The commented-out print statements can be uncommented to write some info about the environment to the Apache log files should there be a problem getting the JMdictB app started.

    import sys, os, os.path as p
      # Add our root directory to sys.path so that our local jmdictdb
      #  package will be imported in preference to any system-installed one.
    our_directory = p.dirname (__file__)
    sys.path[0:0] = [p.normpath (p.join (our_directory, '../../'))]
    import jmdictdb
    #print ("jmdictdb: wsgi from %s" % __file__, file=sys.stderr)
    #print ("jmdictdb: jmdictdb from %s" % jmdictdb.__file__, file=sys.stderr)
    sys.wsgi_file = __file__   # See comments in views/cgiinfo.py.
    if not os.environ.get('JMDICTDB_CFGFILE'):
        cfgfile = p.normpath (p.join (our_directory, '../lib/jmdictdb.ini'))
        os.environ['JMDICTDB_CFGFILE'] = cfgfile
    #print ("jmdictdb: cfgfile from %s"%os.environ['JMDICTDB_CFGFILE'],file=sys.stderr)
    from jmdictdb.flaskapp import App as application

The .wsgi file does not need to have execute permission but does need to be readable by the webserver process.

3. Tests

The tests should be run frequently during development to confirm that changes made in one place don’t break things somewhere else.

Although these tests use the Python "unittest" testing framework, they are for the most part functional or integration tests, not unit tests. There are a large number of units and many have relatively complex behavior; there are not enough development resources available to do the analysis and creation of mock objects required for "pure" unit tests.

The testing infrastructure code is in this directory. Tests are are in three groups:

  • Library tests in tests/tests/ — These are tests of the library modules in the jmdictdb/ subdirectory.

  • View code tests in tests/flaskt/ — These utilize the test client that comes with Flask to simulate url requests and generate the resulting html pages.

  • End-to-end web page tests in tests/e2e/ — These tests use the Selenium web driver (https://www.selenium.dev/) to automate a web browser to retrieve pages from a web server and analyze them.

Additionally there is a forth group of test in tests/locust/ for concurrently testing. It uses the Locust testing framework and is not yet documented.

Test code is contained in python modules in the tests/tests/, tests/flaskk/ and tests/e2e/ directories whose names start with "test_'. Test cases are methods in the test classes whose names start with "test". Data for the tests is in data/.

3.1. Library tests

The tests can be run either by a project-provided tool, "runtests.py", or by Python’s builtin "unittest" module. Using runtests.py is usually more convenient.

3.1.1. Using runtests.py

A test runner program, runtests.py, was written that provides more control of the tests run and the results format than the unittest module does (or did, when runtests.py was written.)

The command (run in python/tests/):

./runtests.py

will run all library tests. To run a specific test or set of tests, specify them as 'module name', 'class' and 'test'. Examples:

./runtests.py test_jelparse
./runtests.py test_jelparse.Roundtrip
./runtests.py test_jelparse.Roundtrip.test1000290

By default tests are looked for it the tests/ subdirectory of the main tests/ directory but a different directory can be specified with runtests.py’s --dir option.

Tests can also be specified by filename path by including a "/" character in the test argument. When this is the case the --dir argument is ignored. To run the tests in the file "newtest.py" located in the main tests/ directory (not tests/tests/):

./runtests.py ./newtest.py

Leave off the ".py" when adding a test class or test method"

./runtests.py ./newtest.MySql.test_0018

Multiple test arguments can be given. To debug test failures:

python3 -mpdb runtests.py -d [tests...]

This will start the test under the Python debugger (type 'c' to stop running) and will stop at the first error or failure allowing the use of pdb to diagnose the problem. (Note that it will stop at any exception including intentional ones such as SkipTest so this is best used when running specific individual tests.)

For more details on argument syntax: ./runtest.py --help

3.1.2. Using Python’s unittest command

To run all the tests:

cd python/tests
python3 -m unittest tests/test_*.py

or

cd python/tests
python3 -m unittest discover -s tests

3.2. Web page tests

3.2.1. Prerequisite setup

The Selenium (https://selenium.dev) web driver will need to be installed on the development/test machine. The test code is currently hardwired to use Firefox as the target browser in the setUpModule() functions of the web test files.

The web tests need to access the jmtest01 database via the svc URL parameter; you will need to have a [db_jmtest01] section in the config-pvt.ini file used when running the Flask debug server. See sections 2.5, “The Flask development server” and 4, “The Flask development server” for details about setting up and running the Flask development server.

The web page tests assume the existence of a user JMdictDB editor account that the tests can log in as. A separate JDdictDB user is recommended for this purpose although an existing real user can be used if desired. If a dedicated test user account is used, it can be manually created one time with either the JMdictDB users.py web page or with the bin/users.py command line tool. (This assumes the typical case where both the production server’s and the Flask development server’s configuration files specify the same "jmsess" user database and thus share the same set of JMdictDB users.) The account userid, full name and password may be whatever you wish. They should be placed in a file named:

tests/testuser.pw

in the order given, one per line. Case must match. Unless all users on the machine are trusted, read access should be limited to the user who will be running the tests.

3.2.2. Running the web page tests

Start the Flask web server using the default port of 5000. (That port is currently hardwired into the test code.)

To run all the tests:

./runtests.py --dir web

You can run individual tests the same way as for the library tests by specifying the test module / test class / test method names, "." separated, as arguments to runtests.py.

3.3. Test construction

In the test methods in the test modules, "_" is often used instead of the conventional "self" as the name of the first parameter for brevity.

The test methods often call a helper function to do the grunt work. These functions typically called with the test case object as the first argument (since they often need access to the .assert* methods of the test case object). These have exactly the same form as they would if they were test case methods but using them as functions allows them to be called from multiple test case classes. This seems simpler than the alternative of a hierarchy of test case subclasses possibly with multiple inheritance mixins.

Because most tests do not adhere to unit testing principles, isolation between tests is not a high priority. Consequently, caching is often used to avoid time consuming operations such as database creation. Database connections and things like JEL parser instances are created once per test runner execution and reused for multiple tests.

An important data object is jdb.KW, a collection of static keyword tables usually initialized by jdb.dbOpen(). Because this is a module global it retains state between execution of test modules. It is critical that any test module that will reference it call jdb.dbOpen() or DBmanager.use() in at least in a setUpModule() function if not in test case or test setUp() functions. In particular making those calls outside of any function will have the effect of executing them at test module import time and should any other test (even in a different test module) change the contents, all following tests (that don’t call jdb.dbOpen()/DBmanager at run time) will see the changes too.

If a test module code references other files (data files or modules to import) references should be relative to the python/tests/ directory, not the python/tests/tests/ directory the test module are in.

3.4. The tests database

Prior to rev git-190508-02d9fdb the JMdictDB tests used the live "jmdict" database (loaded from the EDRDG jmdict XML file) as a source for test data. This was an unfortunate choice made when the tests were first implemented based on the erroneous assumption that most existing entries were stable and unlikely to change frequently. In fact, entries are constantly being edited resulting in the need to constantly revise tests to keep up.

A static test database named "jmtest01" was developed by identifying the entries used in the live tests (plus some additional ones that provide xref targets) and extracting them from a recent jmdict XML file, loading them into a new, empty JMdictDB database, from which a loadable copy was produced using Postgresql’s pg_dump tool. The process used is documented in note 1 below.

3.4.1. Loading the tests database in test code

Test code should load a test database by importing DBmanager from module jmdb.py and calling DBmanager.use(DBNAME,DBFILE) where DBNAME is the name of a database to create in the Postgresql server and DBFILE is the name of a file containing a Postgresql database dump that are the contents of the data to load.

The first time the dump file is loaded by jmdb.DBmanager.use(), a hash of the dump file is calculated and saved in the database. When tests are run again later, the database hash is checked against hash of the request dump file and if the former does not match or does not exist, the database is reloaded from the dump file. If there is a match, the correct database is already loaded and can be used without a time consuming reload. This of course presumes that tests do not make any modifications to the database that would invalidate it for subsequent tests. If that is not the case the test should invalidate the database hash to force a reload the next times tests are run. This can be done by deleting the row(s) in testsrc or by updating testsrc.hash to an invalid value like ''.

3.4.2. Manually forcing a reload of the tests database

The script ./load_testdb.sh can be used to unconditionally force a reload of the test database. This is useful, for example, before making and saving changes to the test database to assure the changes are made to a clean, unmodified version of the database.

load_testdb.sh is a simple script that effectively accomplishes the following:

$ dropdb --if-exists jmtemp01
$ createdb -O jmdictdb jmtemp01
$ psql -d jmtemp01 -f python/tests/data/jmtest01.sql

but in addition saves a hash of the .sql file to a table in the database. This allows the tests to verify the correct version of jmtest02 is loaded and reload it if need be.

3.4.3. Updating the tests databases

When new tests are added it is sometimes necessary to add new test entries to the test database(s). Additionally, if the JMdictDB database schema is updated to a new version, the test database must be updated to match.

Maintenance on the test database can be performed by loading into Postgresql and using the JMdictDB web or commandline tools or using the usual Postgresql tools (e.g. psql). If using the web pages to edit/add/delete entries you’ll need to add a "service" section to the active config.ini (or config_pvt.ini) file to make the test database accessible to the web UI via the svc=…​" url parameter.

To update the test database to the current JMdictDB database version:

Load a fresh copy of the test database:

$ cd tests
$ ./load-testdb.sh data/jmtest01.sql

Then apply the latest database updates to as you would to the main "jmdict" database:

$ psql -d jmtest01 -U jmdictdb -f ../db/updates/nnn-xxxxxx.sql

And when all changes are complete, save it:

$ pg_dump -d jmtest01 > data/jmtest01.sql

4. The Flask development server

The primary tool for devekoping and debugging the web interface is the local web server built into the Flask web framework. It can be started with:

tools/run-jmapp.py {{CONFIG-FILE}}

where {{CONFIG-FILE}} is the configuration file to use. The usual config file will have log messages directed to a file and certain log levels suppressed or promoted. When running under the Flask server for debugging purposes it is often better to run with a config file that directs log messages to stderr so you can see them in the terminal window that is running the Flask server.

Make a copy of the {{DEVDIR}}/web/lib/jmdictdb.ini file to, say, debug.ini. Edit it and comment out the "LOGFILE = …​" which will cause messages to go to stderr instead. You also probably want to see at least all "INFO" level mesasges so "LOGFILTER" should have at least the line I^.* in it. You can then, from {{DEVDIR}}, run:

$ tools/run-jmapp.py web/lib/debug.ini

and browse to http://localhost:5000/ to get the search page.

The Flask server will run the code in {{DEVDIR}}, not the installed code, so you can modify the code and view the effects, insert pdb breakpoints for debugging etc.

For full details see the Flask documentation at https://flask.palletsprojects.com/en/2.0.x/server/

The flask server will by default access the same "jmdict" database as the installed production server and changes made to entries will affect the same entries used to produce the JMdict and JMnedict XML files. To access a different "throw-away" database, see section 8. Creating a database for development and use a "svc" URL parameter (e.g., "…​/srchform.py?svc=jmtest" to direct the web server to it.

5. JMdictDB development process

Changes can be catagorized as:

  1. Changes to the python code or other files that don’t affect the database.

  2. Changes to the database schema or static table content. These changes will typically involve modifying the files in db/* or jmdictdb/data/kw*.csv.

Changes in the second catagory will require, in addition to changes made to the files themselves, the creation of a SQL script that will update any existant databases with the same changes.

The actual process may depend on the exact nature of the changes but will typically follow the list below. If the changes don’t involve the database the steps 5.5 — 5.10 can be skipped. And of course steps before the final commit are interative and repeated until the desired changes are successfully implemented.

5.1. Create a git branch for the development work

It is often most convenient to do work in a new branch although this is not required.

$ cd {{DEVDIR}}
$ git status

Git should say "working directory clean" or list only "untracked files".

   $ git checkout master
   $ git checkout -b <new-branch-name>
You can replace <new-branch-name> with any branch name you want.

5.2. Run the tests

$ cd tests
$ ./runtests.py

If there are errors they should be fixed before continuing. This will prevent confusing preexisting errors with errors resulting from your changes.

5.3. Write tests for the changes

Preferably tests for the new behavior should be written prior to implementing the changes. The tests should fail now but pass later after the changes have been implemented. That’s how it works in an ideal world. :-)

But even if the tests are written later after the changes are implemented, they should be written and be part of the final commit.

5.4. Update the code

Update the code as needed. The Flask server (see section 4, “The Flask development server”) can be used for viewing the web pages during development.

5.5. Update the database files (if needed)

This will typically involved changes to the files in db/ that define the database schema or /jmdictdb/data/ that define the contents of static tables (e.g., kw*.csv).

Update (aka version) numbers are assigned to the database (stored in table "db") and the JMdictDB code checks the database update number when it opens a connection to the database. If the retreieved number is not that expected by the code, an exception is raised which generally terminates the code being run.

So a change involving the database schema or some static data requires:

  • Changing the database update number in db/mktables.sql

  • Changing the database update number in jmdictdb/dbver.py

  • Writing a SQL script to change the database update number in existing databases (along with making the schema/data changes themselves).

There are two kinds of updates:

  • "replacement" update: The changes to the database are incompatible with the pre-change code (i.e., they have potential to result in erroneous code execution.)

  • "suplementary" update: The pre-change code will work ok with an updated database but the update number is used to record that the database has had the update applied.

5.5.1. Replacement update

When the database schema or tables' contents is changed in a way that could result in pre-update JMdictDB code failing, a new update number is added and existing update number(s) deactivated. This effectively replaces the previous update number(s) with a new one and prevents older code that doesn’t understand the newer database changes from attempting to access it.

5.5.2. Suplementary update

Some changes do not affect the ability of the pre-update version code to work with the post-update version of the database. Examples could be changes to tags that aren’t referenced in the code anywhere or the addition of database objects (tables, views, etc) that aren’t referenced in the code yet. In this case what we do is to add an additional update number. The earlier update number remains active and controls code access to the database but the additional number provides supplementary information that the tag update has been applied to the database.

5.6. Generate an update number

Database update (aka version) numbers are random 6-digit hexadecimal numbers. A quick way to get one is to run the following code:

python -c 'import random;print("%06.6x"%random.randint(0,16777215))'

5.7. Update the database version number in the source code

There are three places the database update (aka version) number is specified; they must all be compatible.

  • db/mktables.sql file:
    Sets the version number(s) for newly created databases.

  • jmdictdb/dbver.py file:
    Sets the database version number(s) expected by the JMdictDB code.

  • db/updates/nnn-xxxxxx.sql:
    Sets the version number(s) in existing databases.

5.7.1. db/mktables.sql

This specifies the database update number that will go into the "db" table when a new JMdictDB database is created.

If this is a replacement update (see 5.5.1, “Replacement update” above), then find the line near the top of db/mktables.sql that looks like:

\set updateids '''zzzzzz'''

where "zzzzzz" is a 6-digit hexadecimal string, or possibly several such strings separated by commas, and replace everything inside the quotes with the new 'xxxxxx' update number.

If this is a supplementary update (see 5.5.2, “Suplementary update” above) and add "xxxxxx" to the end existing numbers:

\set updateids '''zzzzzz,xxxxxx'''

5.7.2. jmdictdb/dbver.py

If this is a replacement update (see 5.5.1, “Replacement update” above) then replace the the existing version number(s) with the new one(s). For example, rev 220826-7340b74 updated the database to version cda09c and replaced:

DBVERS = [0xb5d00f]

with:

DBVERS = [0xcda09c]

If it is a supplementary update (see 5.5.2, “Suplementary update” above) then no change to dbvers.py is needed; by definition a supplementary database update adds features to the database that will not prevent the existing version of the code from working correctly thus the code need only check for the non-update version of the database.

5.8. Write an update script to update existent databases

Create a new database update file in db/updates/ named:

nnn-xxxxxx.sql

where 'nnn' is a sequential 3-digit decimal number (use the next highest number from the update files already in db/updates/) and 'xxxxxx' is the database update number generated above. 'nnn' serves only as an aid in applying the updates in the right order, it has no significance beyond that.

Use one of the templates in Appendix D: Template for database (replacement) update script or Appendix E: Template for database (supplementary) update script depending of the type of update being implemented.

The update, when applied to a database of the required version, should produce a database schema and static tables contents that exactly match a database created by from scratch by the install proceedure (see install.adoc)

Note that the database changes are done inside a transaction: if there is an error when you run the script the database is left unchanged so you can simply correct the error and rerun the script-- no need to undo a partially applied update.

5.9. Test the database update script

You’ll want to test the database update script on both the test database (confirming all the tests still pass) and a copy of a production database.

5.9.1. Test with the test database

First, load a fresh, clean copy of the test database:

$ cd tests
$ git status
[confirm that file data/jmtest01.sql has not been modified;
if it has, restore the unmodified version from git before
continuing.]
$ ./load-testdb.sh data/jmtest01.sql

Apply the database update to it:

$ psql -d jmtest01 -Ujmdictdb -f ../db/updates/nnn-xxxxxx.sql

If there were errors applying the update, fix them and repeat the above steps including reloading the test database from the jmtest01.sql file.

After the update applies cleanly, save a temporary copy of the updated jmtest01 database. It is important to do this before running any tests as some tests may modify the database.

$ pg_dump jmtest01 >data/jmtest01.sql-new

Run the tests again:

$ ./runtests

If any errors occur in the tests the problem may be due to a problem in the database update script (in which case correct it and continue from the start of this section), or some tests themselves may need updating to work with the updated database. In the latter case, fix the tests until all pass.

5.10. Update and save the tests database

It is critical to load a fresh copy of the tests database, apply the database update(s) and then save the updated version. DO NOT save a copy after running any tests; the tests make changes to the data that should not be saved.
$ cd tests
$ ./load-testdb.sh data/jmtest01.sql
$ psql -d jmtest01 -U jmdictdb -f db/updates/nnn-xxxxxx.sql
$ pg_dump jmtest01 >data/jmtest01.sql

For more details on the jmtest01 database see 3.4, “The tests database”.

5.11. Run tests and commit the changes

Run the tests again one last time to verify everything is still working.

You did add tests for the changed behavior, right?
$ cd tests/
$ ./runtests.py

Run:

$ git status

and verify the expected files were modified. Confirm the new nnn-xxxxxx.sql file listed as "untracked" as well as any other new files that were added as part of the changes.

In the commit message below, please include the database update number(s) in parenthesis at the end of the message if the number was changed (e.g., "Added new table foo (db-xxxxxx)".)

Commit the changed files:

$ git add db/updates/nnn-xxxxxx.sql [...any other new files...]
$ git add -u    # Adds the modified files
$ git commit
[enter the commit message; if the database update number was
 changed please include it in parenthesis at the end of the
 title line: e.g., "Added feature foo (db-xxxxxx)".]

At this point you can (depending on intent):

  • Push the changes to GitLab and generate a pull request to the JMdictDB project.

  • Merge the development branch into master (or your own private mainline branch) and apply the changes to your local production JMdictDB instance (see 4. Upgrading JMdictDB.)

  • If this is for EDRDG, merge the development branch into master and see 6, “Preparing a "release"”.

6. Preparing a "release"

The JMdictDB project does not produce formal releases per se; instead, after a number of new features and bug fixes have accumulated and after consultation with Jim Breen at edrdg.org (the primary user of JMdictDB), the changes made to branch "master" are merged into branch "edrdg" and both branches are pushed to GitLab. After the edrdg.org site applies the updates from the "edrdg" branch, it is tagged with "edrdg-<YYMMDD>" ("<YYMMDD>" is the year, month and day) to provide a record of what software was running there when.

Before pushing to GitLab, the following should be checked:

  1. Make sure the JEL parser is up-to-date with the parser source file, jmdictdb/jelparse.y:

    $ cd {{DEVDIR}}/jmdictdb/ && make
  2. Confirm all the tests pass (see section 3, “Tests”)

  3. If possible, do a full install from XML files per 5.3. Load the database in the Installation Guide.

  4. Make sure the documentation is up-to-date:

    $ cd {{DEVDIR}}/doc/ && make
  5. and optionally install a local copy to ~/public_html/doc/

    $ make install
  6. Prepare a set of upgrade instructions which may, for complex or major upgrades, be in the form of a permanent document added to the doc/ directory or, for straight forward and routine upgrades, simply an email.

Appendix A: Additional software requirements for development

For building documentation:

  • asciidoctor: Generates .html files from doc/src/*.adoc.

  • dia: Generates the schema diagram .png file from doc/src/schema.dia.

  • libreoffice writer: Generates the schema.pdf file from doc/src/schema.odt

For running tests:

  • pytest (optional): Tests are written to run with Python’s "unittest" module but Pytest is compatible and offers additional features that may be useful sometimes.

  • Selenium web driver: For the web page end-to-end tests in tests/e2e/.

  • xmlstarlet: Used by some test scripts for comparing XML files.

Appendix B: Operational tools

These are tools for use at installed JMdictDB sites and are located in bin/.

bulkupd.py

Allows making similar changes to a large number of database entries at once.

conj.py
dbcheck.py
dbreaper.py
entrs2xml.py

Produce a JMdict or JMnedict XML file from the database contents. This does the inverse of jmparse.py.

ex2txt.py

Produce a Tatoeba sentence examples file from data in a JMdictDB database. This does the inverse of exparse.py.

exparse.py

Parse a Tatoeba sentence examples file into a form suitable for loading into a JMdictDB database by pgload. Normally run by Makefile-db. This does the inverse of ex2txt.py.

jelload.py
jmdbss.txt
jmparse.py

Parse a JMdict or JMnedict XML file into a form suitable for loading into a JMdictDB database by pgload. Normally run by Makefile-db. This does the inverse of entr2xml.py.

kdparse.py

Parse a kanjdic XML file into a form suitable for loading into a JMdictDB database by pgload. Normally run by Makefile-db. These is not currently a program to produce a kanjidic XML file from database data.

mklabels.py
pgload.py

Loads the intermediate file produced ny jmparse.py, kdparse.py or exparse.py into a JMdictDB database. Normally run by Makefile-db.

shentr.py

A command line program that can display entries from a JMdictDB database.

users.py

A program for managing (adding, deleting modifying) users in the jmsess database.

xresolv.py

This is run after pgload.py loads a corpus into a JMdictDB database and attempts to resolve (produce xrefs from) the unresolved xrefs generated by pgload.py. Normally run by Makefile-db.

Appendix C: Build and development tools

These are programs that are used in the development of JMdictDB, either for debugging or as part of that toolchains used for packaging, and distributing the software.

Debugging and development tools

dbcompare.py
dbg-parser.py
dbversion.py

Show the database version number of a chosen database or all JMdictDB databases and whether or nor it is compatible the the current JMdictDB code.

dtdcheck.py
hggit.py

Maps between old Mercurial and current Git revision number.

jmbuild.py
jmextract.py
kwcmp.py
run-flask.py

Run the JMdictDB Flask app in debugging mode.

upd-version.py

Update the jmdictdb/version_.py file based of the current Git revision number.

xmlarch.sh

A script that can be run regularly by cron to check a JMdict or JMnedict XML file and save a copy of the DTD if it has changed since the last run.

Build tools

These are part of the build toolchains and usually invoked by a Makefile or script; they seldom need to be manually run.

install.sh

Run by Makefile to install command line programs and web server software. Replaces the target file only if the source and target file contents differ determine by checksum).

gen_parsetab.py
yply.py

Used in the build process by jmdictdb/Makefile to convert the YACC grammar that jmdictdb/jelparse.y is written in, into a format understood by the ply parser-generator.

One-time or obsolete tools

lic-replace.py

A quick hack that was used to replace the full GPL license text in source files with a short SPID license identifier.

mkiso639maps.py
mkkwmod.py
mklang.py

A one-time use tool to generate the jmdictdb/data/kwlang.csv file.

Appendix D: Template for database (replacement) update script

Use this template if the database changes being made will result in errors if a pre-update version of the JMdictDB code attempts to use the updated database.

\set ON_ERROR_STOP
BEGIN;

-- [Comment summarizing the update]
-- [Optional additional comments providing more detail if needed.]
-- [...]

  -- Jmdictdb schema version id(s) to update database to and current
  -- schema version id(s) required for this update to be applied.
\set dbversion  '''xxxxxx'''
\set require    '''yyyyyy'''

\qecho Checking database version, 0 rows expected...
SELECT vchk (:require);                      -- Will raise error on failure.
INSERT INTO db(id) VALUES(x:dbversion::INT); -- Make this version active.
-- This update supercedes previous updates.
UPDATE db SET active=FALSE WHERE active AND  -- Deactivate all :require.
LPAD(TO_HEX(id),6,'0') IN (SELECT unnest(string_to_array(:require,',')));

-- Do the update.

[SQL statements to implement the update go here.]

COMMIT;

Replace comments at top with appropriate text. Replace "xxxxxx" in with the new update number.
Replace "yyyyyy" in with update number of the previous update number of the database.
The 'xxxxxx' and 'yyyyyy' numbers can also be mutiple, comma-separated numbers (eg '''33d0a7,b6250c''').
Replace the "[sql statements…​]" lines with the SQL statements needed to implement the changes.

Appendix E: Template for database (supplementary) update script

Use this template if the database changes being made will not result in errors if a pre-update version of the JMdictDB code attempts to use the updated database. For example, changes to tags that aren’t referenced by the code, or the addition of database objects (e.g., tables, views) being added now in anticipation of future use by the code.

\set ON_ERROR_STOP
BEGIN;

-- [Comment summarizing the update]
-- [Optional additional comments providing more detail if needed.]
-- [...]

  -- Update version applied by this update.
\set dbversion  '''xxxxxx'''
INSERT INTO db(id) VALUES(x:dbversion::INT);

-- Do the update...

[SQL statements to implement the update go here.]

COMMIT;

Replace comments at top with appropriate text. Replace "xxxxxx" with the new update number.
The 'xxxxxx' number can also be mutiple, comma-separated numbers (eg '''33d0a7,b6250c''').
Replace the "[sql statements…​]" lines with the SQL statements needed to implement the changes.

Appendix F: CGI debugging

The CGI code was deprecated since the adoption of the more efficient and lower-maintenance Flask/WSGI code in November 2021 and removed in August 2023 so this section is preserved for historical interest.

The CGI scripts can be run directly in a terminal window by giving them a URL (which can be copy-pasted from a web browser) command line argument:

$ cd web/cgi/
$ python3 -mpdb entr.py http://localhost/jmdictdbv/cgi/entr.py?svc=jmdict&e=2171804

In the above example, the "-mpdb" runs the Python debugger which is usually useful but is optional.

Pages that normally get their parameters by a POST request and thus do not show the parameters in the URL can be coerced to do so by adding the URL parameter, "dbg=1". To debug the edconf.py (confirmation) page for example, go to the preceding edform.py (edit form) page and add the dbg parameter:

http://localhost/jmdictdbv/cgi/edform.py?e=14082&dbg=1

Click the Next button and the Edit Confirmation page will be displayed with URL parameters:

http://localhost/jmdictdbv/cgi/edconf.py?kanj=%E5%93%80%E6%AD%8C&rdng=%E3%81%82%E3%81%84%E3%81%8B&sens=%5B1%5D%5Bn%2Cadj-no%5D%0D%0A++lament+%28song%29%3B+elegy%3B+dirge%3B+sad+song%0D%0A%5B2%5D%5Bn%2Cadj-no%5D%0D%0A++Lamentations+%28book+of+the+Bible%29&reference=&comment=&name=&email=&svc=jmdict&id=14082&stat=2&dbg=1&src=1&seq=1150170&srcnote=&notes=

You can now copy-paste that into the command line and debug the edconf.py script interactively.

$ python3 -mpdb edvconf.py 'http://localhost/jmdictdbv/cgi/edconf.py?kanj=%E5%93%80%E6%AD%8C&rdng=%E3%81%82%E3%81%84%E3%81%8B&sens=%5B1%5D%5Bn%2Cadj-no%5D%0D%0A++lament+%28song%29%3B+elegy%3B+dirge%3B+sad+song%0D%0A%5B2%5D%5Bn%2Cadj-no%5D%0D%0A++Lamentations+%28book+of+the+Bible%29&reference=&comment=&name=&email=&svc=jmdict&id=14082&stat=2&dbg=1&src=1&seq=1150170&srcnote=&notes='

The "dbg" parameter, once added, is passed along to all subsequent pages.

Appendix G: Construction of the jmtest01 test database

This is for the historical record.

The static test database was developed by identifying the entries used in the live tests (plus some additional ones that provide xref targets) and extracting them from a recent jmdict XML file, loading them into a new, empty JMdictDB database, from which a loadable copy was produced using Postgresql’s pg_dump tool. The process was:

  # Extact the entries listed in jmtest01.seq from a full jmdict XML
  #  file and save them as jmtest01.xml in the tests/ directory.
$ tools/jmextract.py data/jmdict-190430.xml \
   -s python/tests/data/jmtest01.seq >python/tests/data/jmtest01.xml
$ cp python/tests/data/jmdict.xml data/jmdict.xml
  # Create a "jmnew" database from the test data xml file.
$ make jmnew
$ make loadjm     # Loads jmdict.xml into new database "jmnew"
$ make postload
  # Dump the "jmnew" database containing the test data so it can
  #  be reloaded later on demand when running tests.
$ pg_dump -d jmnew >python/tests/data/jmtest01.sql
  # Drop the old "jmtest01" so that the new one will be loaded
  # next time the tests are run.
$ dropdb jmtest01
  # Run the tests.  The first time they are run, they should reload
  # the test database from the new jmtest01.sql file.
$ cd python/tests && python3 runtests.py

The file python/tests/data/data/jmtest01.seq is the list of entry sequence numbers that was determined to be needed by the tests.