November 2021 WSGI Upgrade

Instructions for updating a production JMdictDB server

This document provides instructions to configure an existing JMdictDB site to utilize the new WSGI/Flask framework backend. If you are installing JMdictDB for the first time please see the Installation Guide instead. Please note that the CGI backend remains fully functional for now but is deprecated and will be removed in a future update.

With this update:

The CGI pages will continue working unchanged at their current URLs. Access to the WSGI-served pages is provided by a different set of URLs. (Note that logins are not shared, being logged in on a CGI page will not make you logged in on a WSGI page.)
The WSGI code uses the same page templates and same "guts" code as the CGI pages; pages served via WSGI should look and act nearly identically to the CGI versions.

With CGI, when an HTTP request arrives at the web server the server starts a new instance of the Python interpreter to run the CGI script that handles that particular kind of request. Because Python is a large, resource hungry program, starting it anew for every request is expensive in terms of both time and resources.

With WSGI, a Python interpreter running the Flask framework is started when the web server starts. ^[1] Each incoming HTTP request is handled by, in effect, calling a function in the already running Python process which is much faster and less resource hungry than starting a new Python process each time.

Additionally, the Flask web framework allows simplifying the JMdictDB server code and improving its security by providing better support for things like session management and cookies.

1. Transition stages

Complete transition from the CGI to the WSGI backend may be done in stages to gain confidence at each stage before moving to the next and to preserve fallback capability in case unforeseen problems are encountered. Here is one possible roadmap.

This document addresses only Stage 1; the timing and details of the other stages can be determined later.

Stage 1: Activate the WSGI backend on a new set of URLs, leaving the current CGI URLs unchanged. The new URLs can be provided to users for testing and feedback while others continue to use the CGI URLs as they do now.
Stage 2: After confidence has been established in the compatibility and reliability of the WSGI version, web server redirects can be put in place to redirect all users to the WSGI version via the existing CGI URLs.
Stage 3: After a suitable advance notice period to allow users time to update bookmarks, etc., the redirects can be removed and the WSGI URLs will now provide the only access to the JMdictDB web backend.

2. Operational and user visible changes

The following are the major differences from the CGI version that will be encountered with the WSGI version.

2.1. Editors required to enable cookies

Logged in editors will have to have cookies enabled for the JMdictDB site in order to stay logged in. Login/logout services are now provided by the Flask framework using secure, encrypted cookies to maintain session state; supplying SID values in the URL will no longer work.

As before with CGI, cookies are not required for non-logged in users.

2.2. Client-side sessions

As mentioned, sessions (which maintain login state) are now maintained in browser cookies rather than on the server. Logins from different browsers or different cookie domains on the same browser (e.g., a normal tab and a "private" tab) will result in independent logins — logging out in one will not affect the others.

2.3. Code changes aren’t immediately effective

When the JMdictDB code is upgraded or otherwise modified the changes will not be immediately visible because the code running in the web server was loaded before the changes and is not automatically reloaded. To reload with the new code:

Change the modified-date on the jmdictdb.wsgi file with, for example, the 'touch' command.

This will cause a restart of the jmwsgi Python processes and a reload of the JMdictDB code. Alternatively and more intrusively:

Restart the Apache webserver.

2.4. Different config and log file names.

One can use the same configuration and log files for the CGI and WSGI server versions (the format of the contents of both are the same) but it can help avoid confusion if they are kept separate, at least initially. The instructions below do that.

3. Procedure

Note	The JMdictDB software and the two major components it relies on, Apache web server and Postgresql database server are highly configurable. These instructions are based on a site configured per the Installation Guide and will likely require adaptation for the specifics of a particular site.

In this section the following are used as placeholders and should be substituted with actual values.

{{DEVDIR}} — The local directory into which the JMdictDB software has been cloned from the GitLab JMdictDB repository. Example: /home/me/jmdictdb-dev
{{WEBROOT}} — The directory to which the JMdictDB web files are installed and from where the web server has been configured to serve them. Example: /usr/local/apache2/jmdictdb
{{WSGI}} — A directory under {{WEBROOT}} that will hold the jmdictdb.wsgi script. This may be an existing directory such as the cgi/ or cgi-bin/ directory or a new directory such as wsgi/. More details in section: Create a .wsgi file.
{{URLROOT}} — The virtual directory in which the JMdictDB pages will appear in URL-space. Example: /jmwsgi (the URLs for the JMdictDB pages will, assuming a host name of edrdg.org for example, start with https://edrdg.org/jmwsgi/)

{{WEBROOT}} should be an absolute directory path with a leading "/" character'. {{URLROOT}} should also include a leading "/" character.

3.1. Install prerequisite software packages

The new WSGI/Flask web backend adds two additional software packages to the requirements for JMdictDB (see Install: Requirements):

mod_wsgi module for Apache (https://modwsgi.readthedocs.io/en/master/)
flask package for Python3 (https://pypi.org/project/Flask/)
python-dateutil package for Python3 (https://pypi.org/project/python-dateutil/)

These should be installed via the usual methods for your operating system.

3.2. Perform a normal code upgrade.

If you have a checkout of JMdictDB already:

cd {{DEVDIR}}
git checkout edrdg
git pull

Otherwise you can clone the JMdictDB repository:

$ git clone https://gitlab.com/yamagoya/jmdictdb.git {{DEVDIR}}
$ cd {{DEVDIR}}

The following 'make' command needs to be run as root, either from a root login or via sudo. It installs the JMdictDB Python package to a (Python determined) system-wide location, the command line programs to /usr/local/bin/, and updates any CGI and WSGI scripts.

# cd {{DEVDIR}}
# sudo make WEBROOT={{WEBROOT}} install-sys

The WEBROOT=… part can be left out if you want to install to the default location of /var/www/jmdictdb/.

3.3. Create .ini, .log and .wsgi files.

While it is possible for both the CGI and WSGI backends to share the same log and configuration files, it is less confusing during the transition period to keep them separate.

3.3.1. Create a new jmwsgi.log log file.

The name, "jmwsgi.log" may be changed to whatever name is preferred. The file’s permissions must allow the web server process to write to it.

# cd {{WEBROOT}}/lib/
# touch jmwsgi.log
# chgrp www-data jmwsgi.log
# chmod 664 jmwsgi.log      # Can use 660 if preferred.

3.3.2. Create a jmdictdb.ini config file based on the CGI one.

Assuming the CGI config file is named config.ini:

# cd {{WEBROOT}}/lib/
# cp config.ini jmdictdb.ini

You can use a name other than jmdictdb.ini but will need to make a corresponding change in the .wsgi file (see below).

3.3.3. Edit jmdictdb.ini and set the log file name.

Edit the new jmdictdb.ini and change the name of the log file to match what was chosen above (e.g., jmwsgi.log). For example, change:

[logging]
  ...
LOG_FILENAME = jmdictdb.log

to:

[logging]
  ...
LOG_FILENAME = jmwsgi.log

The name of the private ini file (typically config-pvt.ini) is also set in the configuration file but can remain the same since it can be shared by both the CGI and WSGI backends (both will be accessing the same databases with the same credentials.)

3.3.4. Add a new item to the config-pvt.ini file

Edit the config-pvt.ini file and add a new section above the db_* sections. (See the config-pvt.ini-sample file for an example.)

[flask]
key = xxxxxxxxxxxxxxxx

but replace the "xxxxxxxxxxxxxxxx" with a string of 16 to 32 random characters. You can generate such a string using one of the many online password generators such as: https://passwordsgenerator.net/

3.3.5. Create a .wsgi file

This file is a shim between Apache and the JMdictDB software. Its name is specified in the Apache configuration directives (see below) and its job is to load the JMdictDB Flask module into Apache’s mod_wsgi processes when they are started. It can be placed in any directory the web server has been configured to execute a wsgi script from. The existing CGI script directory may be convenient if it allows the execution of WSGI scripts, or you can create a new directory, for example: {{WEBROOT}}/wsgi/.

# cd {{WEBROOT}}/{{WSGI}}/

Create a file, jmdictdb.wsgi, with the following contents:

import sys, os
import jmdictdb
sys.wsgi_file = __file__   # See comments in views/cgiinfo.py.
if not os.environ.get('JMDICTDB_CFGFILE'):
    p = os.path
    our_directory = p.dirname (__file__)
    default_cfgfile = p.normpath(p.join (our_directory,'../lib/jmdictdb.ini'))
    os.environ['JMDICTDB_CFGFILE'] = default_cfgfile
from jmdictdb.flaskapp import App as application

If you placed the .wsgi file in a directory other than a sibling directory of {{WEBROOT}}/lib/ or you chose to use a filename other than jmdictdb.ini, you will need to adjust the relative path and/or filename in the default_cfgfile=… line in the .wsgi file above.

3.4. Apache web server configuration

Use the Apache configuration directives below. They can go either into a separate .conf file (e.g., jmwsgi.conf) in the Apache directory for such files, or can be added to an existing configuration file (you may have an existing jmdictdb.conf file for example.) If an existing .conf file is modified, it would be a good idea to save a copy of the original file before modification in case Apache does not like the changes.

Replace {{WEBROOT}}, {{WSGI}} and {{URLROOT}} with the appropriate values. In particular, {{URLROOT}} is the top level virtual directory you want the JMdictDB pages to be served under (via WSGI). For example, using "/jmwsgi" will result in the JMdictDB pages being available at https://edrdg.org/jmwsgi/… Also note that the paths in the Alias directive must end with a "/" character.

WSGIDaemonProcess jmwsgi processes=2 threads=10 \
    display-name=apache2-jmwsgi locale=en_US.UTF-8 lang=en_US.UTF-8
WSGIProcessGroup jmwsgi
WSGIScriptAlias {{URLROOT}} {{WEBROOT}}/{{WSGI}}/jmdictdb.wsgi \
    process-group=jmwsgi

  # Serve static files directly without using the app.
Alias {{URLROOT}}/web/ {{WEBROOT}}/
<Directory {{WEBROOT}}>
    DirectoryIndex disabled
    Require all granted
    </Directory>

Important

You must also arrange for Apache to access and execute the jmdictdb.wsgi as a WSGI script. It you have placed it in the CGI directory and that is already configured to execute scripts based on their extension, you may need nothing more. If you’ve placed it in a new directory, you may need a <DIRECTORY> section for it and a directive like "SetHandler wsgi-script". Please refer to the Apache documentation for details.

The above file defines the URL for the WSGI versions of the JMdictDB pages in the Alias line. Using a host name of edrdg.org for example, the WSGI version of the search page would be at https://edrdg.org/jmwsgi/srchform.py

The number of process and threads can be adjusted depending of server capacity (number of cores, amount of memory, etc) and expected request load. For more information see the mod_wsgi documentation: https://modwsgi.readthedocs.io/en/latest/user-guides/processes-and-threading.html

3.5. Restart the web server

This will cause the web server to read the new configuration. After the web server has been restarted pointing a browser to the URL (and again using a host name of edrdg.org as an example):

https://edrdg.org/jmwsgi/srchform.py

should result in the Advanced Search Page being shown. In the bottom right corner of the page you should see: "wsgi/DB=jmdict".

Caution

The WSGI pages do not access a test version of the database; changes made via the WSGI pages will appear in the production database just as if they’d been made through the CGI version.

Note	If the web server won’t restart, the web server error logs will have more information. Removing the new Apache configuration file (or restoring the unmodified version if an existing file was modified) should get the server back up while the problem is investigated.

1. Actually mod_wsgi in the web server will typically start several Python processes, each with several threads to which incoming requests are assigned but the idea and effect are the same.