INSTALLATION

Although this software was written and is maintained primarily to support Jim Breen’s JMdict and wwwjdic projects, you may wish to install a local copy of this software to:

  • Run a local JMdictDB server for your own use.

  • Contribute development work to the JMdict project.

  • Use or adapt the code to a project of your own.

This document describes how to setup a functional instance of a JMdictDB server including loading dictionary data.

Requirements

The code is currently developed and tested on Ubuntu Linux using Apache as a web server and Postgresql as the database server. The webserver should be configured to run Python CGI scripts.

Regarding Microsoft Windows: Up to mid 2014 the code also ran and was supported on Microsoft Windows XP. However, lack of access to a Windows machine has required dropping Windows support. Please be aware that lingering references to Microsoft Windows in this and other documentation are outdated and unsupported.

JMdictDB requires Python 3; Python 2 is no longer supported although the last working Python 2 version is available in the code repository in the branch, "py2".

Some additional Python modules are also needed. Version numbers are the versions currently in use in the author’s development environment — the software may work fine with earlier or later versions, but this has not been verified.

Python [3.6] (known not to work before 3.3).
Additional Python packages:
  psycopg2-2.7.3 Python-Postgresql connector.
    http://initd.org/projects/psycopg2/
  ply-3.9 -- YACC'ish parser generator.
    http://www.dabeaz.com/ply/
  lxml-4.0.0 -- XML/XSLT library.  Used by xslfmt.py for doing
    xml->edict2 conversion.
  jinja2-2.9.6 -- Template engine for generating web pages.
Postgresql [10.3] -- Database server
Apache [2.4] -- Web server
make [4.1] -- Gnu make is required if you want to use the provided
  Makefile's to automate parts of the installation.
wget -- Used by Makefile-db to download the dictionary XML files.
  If not available, you can download the needed files manually.

Database Authentication

We use the term "database server" when referring to the entire Postgresql instance. JMdictDB uses multiple, separate databases within the database server. For example, the database "jmdict" is typically the "production" database while database "jmnew" is used for loading dictionaries from source files. There may be a "jmtest" database and so on.

Any program that accesses the database server needs a username and possibly a password to do so. The database server is accessed by the JMdictDB system in two contexts:

  • When cgi scripts are executed by the web server.

  • When Postgresql and JMdictDB command line tools are run by a user. This includes their execution by the makefiles during installation.

In the first case the cgi scripts will use database usernames and passwords defined in the config-pvt.ini file which will be described in more detail below.

In the second case, the database username and password may be given on the command line, environment variables, or a .pgpass file. The last option is usually the most convenient and secure and is the method documented below. A detailed discussion of Postgresql authentication methods is out of scope here but [note1] provided some further references.

Editor Authentication

JMdictDB provides a separate, application-level authentication scheme for web access. Users can login as Admin, Editor or access the system anonymously by not logging in. The CGI scripts allow anonymous users to submit unapproved edited or new entries, but to approve or reject entries, a user must be logged in as an Editor. A user logged in as Admin can additionally manage other users.

JMdictDB users and their access levels are store in a separate separate database named "jmsess". This database need only be setup once.

Procedure

Setting up a JMdictDB instance involves:

  1. Download and unpack (or clone) the JMdictDB software into a local directory.

  2. Configure the Postgresql database server.

  3. Use the JMdictDB tools to load the database with dictionary data.

  4. Configure the Apache web server.

  5. Deploy the JMdictDB software. This step installs the jmdictdb library package, command line programs and web/cgi scripts to locations independent of the code in the download directory allowing the latter to be changed or deleted without affecting the operation of the installed version.

  6. Create a config.ini file for CGI settings

A makefile is provided that automates loading the database (Makefile-db) and installing the JMdictDB software (Makefile). It is presumed that there is a functioning Postgresql server instance, and that you have access to the database server’s "postgres" account or some account with enough privileges to create and drop databases.

1. Getting the code

There are two main branches in the code:

  • master: the latest version and the branch new development should generally be based on.

  • edrdg: the version currently running at edrdg.org

Clone the JMdictDB repository at GitLab:

$ git clone https://gitlab.com/yamagoya/jmdictdb.git <DEV>

or, you can download a tar or zip archive at:

(The download button is just to the left of the green Clone button.)

2. Configure the Postgresql database server

2.1. Provide access to the Postgresql server

JMdictDB accesses the Postgresql database using two dedicated Postgresql database user accounts, by default named 'jmdictdb' (for read-write access) and 'jmdictdbv' (for read-only access) although they can be changed in Makefile-db.

Create a .pgpass file which will allow access to the database server without the need to enter a password each time a database command is run:

Choose passwords for the 'jmdictdb' and 'jmdictdbv' database user accounts and determine the password (if one in needed) for the 'postgres' user (or whatever the account specified by PG_SUPER in Makefile-db is.) This user is used when creating the 'jmdictdb' and 'jmdictdbv' users.

In your home directory create a file named .pgpass with mode 600 and contents:

localhost:*:*:postgres:xxxxxx
localhost:*:*:jmdictdb:yyyyyy
localhost:*:*:jmdictdbv:zzzzzz

where "xxxxxx", "yyyyyy" and "zzzzzz" are the passwords for the database "super user" and the two JMdictDB access users respectively. If you changed PG_SUPER replace 'postgres' above with the changed value. This will prevent Postgresql from prompting you for passwords dozens of times.

Important
If the file mode is not 600 Postgresql will ignore the file.

2.2. Create the JMdictDB Postgresql users

$ make -f Makefile-db init

You will be prompted for the password to use for the new 'jmdictdb' database account. Use the same passwords as entered above in the ~/.pgpass file.

Enter password for new role: yyyyyy
Enter it again: yyyyyy

The same process is repeated for the 'jmdictdbv' account (use the zzzzzz password this time.)

Important
You need (and should) only do the 'make -fMakefile-db init' step once when installing JMdictDB on a machine for the first time, even if you install the JMdictDB software multiple times.

3. Load the database

By default, the main "production" database is named "jmdict". Other databases are used when loading data, for testing, etc. The makefile targets that load data do so into a database named "jmnew" so as to not damage any working database in the event of a problem. A make target, "activate" is provided to move a newly loaded database to "jmdict".

The process is:

$ make -f Makefile-db jmnew

Repeat the following, where "loadXX" is one of loadjm (JMdict), loadne (JMnedict), loadkd (kanjidic2), loadex (Tatoeba examples) for as many of the those sources as you want to load. Each of the "loadXX" targets will download the appropriate source file to the data/ directory, parse it and load the data into the "jmnew" database.

$ make -f Makefile-db loadXX

Then as the last step:

$ make -f Makefile-db postload

As a shortcut, the target loadall will do the above for all four of the dictionaries.

Caution
No provision is made for concurrent access while loading data; we assume that only the access to the database being loaded is by the procedures used for the loading. However, use of databases other than the one being loaded (which is usually "jmnew") can continue as usual during loading.

If everything went well you can do:

$ make -f Makefile-db activate

That will rename the current "jmdict" database (if any) to "jmold" and rename the "jmnew" database that was just loaded to "jmdict".

4. Deploy the JMdictDB software

At this point you have three ways to make make the database command line tools and and cgi web pages available:

  1. Serve them directly from your development directory.

  2. Install them to standard locations in your home directory.

  3. Install them to standard system-wide locations.

Any or all of the above can be done and each install is independent of the others. In particular you can do development work in your development directory while using a previous install to your user directories as a reference, and an earlier system install to a production location is available to all. If you have no need to do development work you can delete your development directory after doing the install in either option #2 or #3.

We will describe only options #2 and #3 here; option #1 is described in the DEVELOPMENT document.

The JMdictDB software is deployed system-wide by:

# make -f Makefile install-sys

and for you as an individual user by:

$ make -f Makefile install-user

Note that when installing system-wide you must execute the 'make' command as a 'root' user with, for example, 'sudo'.

You can also leave out the "-f Makefile" part on those commands since 'make' uses that file by default.

The 'install-sys' target will, by default, install to the following locations:

  • Web files

    /var/www/jmdictdb/

    CGI scripts

    /var/www/jmdictdb/cgi-bin/

    Admin files

    /var/www/jmdictdb/lib/

    Command line programs

    /usr/local/bin/

    Python library modules

    /usr/local/lib/pythonX.Y/dist-packages/ [*]

For 'install-user' the locations are:

  • Web files

    ~/public_html/

    CGI scripts

    ~/public_html/cgi-bin/

    Admin files

    ~/public_html/lib/

    Command line programs

    ~/.local/bin/

    Python library modules

    ~/.local/lib/pythonX.Y/site-packages/jmdictdb/ [*]

    [*] — The exact location is determined by Python. X and Y are the major and minor version numbers of the installed Python.

For either target you can override the default locations by overriding the appropriate 'make' variable. WEBROOT controls the location of the web components. For example:

# make WEBROOT=/srv/jmdictdb/ install-sys

will install the web components to /srv/jmdictdb/, srv/jmdictdb/cgi-bin/, etc. The locations of the non-web components (command line programs, python library modules) are not altered.)

Next you will need to tell the Apache web server where to find the web components and at what URL to serve them.

5. Configuring the Apache webserver

For exposition we’ll assume the JMdictDB pages will be served at the url http://myhost.org/jpdicts/ and that the JMdictDB web components have been installed at the default location of /var/www/jmdictdb/.

Create an Apache configuration section like the following:

ScriptAlias /jpdicts/cgi/ /var/wwww/jmdictdb/cgi-bin/
Alias       /jpdicts/     /var/wwww/jmdictdb/
<Directory /var/wwww/jmdictdb/>
  AllowOverride All
  SetEnv LANG en_US.utf8
  Options Includes FollowSymLinks ExecCGI
  Require all granted
</Directory>

This can go in the main Apache configuration file, or at most sites, in a separate file in a configuration directory. Refer to the Apache documentation for specifics.

Restart the web server:

# systemctl restart apache2

If you point your web browser to http://localhost/jpdicts/cgi/srchformq.py you should see a JMdictDB page with an error message about the config.ini file not being found. Which is correct since we’ve not set up a config file yet.

6. CGI config.ini and log files

6.2 config.ini

The config.ini file contains settings used by the CGI scripts when serving web pages. At a minimum you’ll need to provide settings that describe each database you want to be accessible via the web.

There must be config.ini file. There may also be a config-pvt.ini file. The latter is intended for database access credentials that should have access limited to the system maintainer and the webserver. However the database access credentials can go into config.ini if you wish (but be sure to protect config.ini appropriately in that case.)

Create a config.ini file (use the config-sample.ini file as a guide). The config.ini file should go in $(WEBROOT)/lib/ (the "Admin files" directory in step 4. Deploy the JMdictDB software).

The config.ini file need contain only settings where the value differs from the setting’s default value. All the settings and their default values are described in config-sample.ini. The config.ini file may be empty if no changes from the defaults are needed but it must be present.

The config-pvt.ini file should be readable by the web server process and not readable by world (it will contain database passwords.) The same applies to config.ini if you’ve chosen to place the database access setting in there instead of in config-pvt.ini.

6.2 Log file

The $(WEBROOT)/lib/ directory is the default location for the cgi log file. However both the name and location can be set in vonfig.ini.

The log file must exist for logging to occur; it will not be automatically created. It also must be writable by the web server process.

6.3 Verify CGI access

You should now be able to view any of the JMdictDB web pages in your web browser.

If you did a system install in step 4 the URL will be:

http://localhost/<VDIR>/cgi-bin/srchform.py

where <VDIR> is the web server virtual directory configured in step 5 (jpdicts in that example).

If you did a user install in step 4, the URL will be:

http://localhost/~<USERNAME>/cgi-bin/srchform.py

where <USERNAME> is your username.

The above URLs will need adjustment if you overrode any of the default locations when installing.

7. Common errors that may occur when accessing a web page

  1. Server Error

    Check the web server’s error log. There will often be a Python stack dump from the cgi script that will identify the problem.

  2. file not found: …​/config.ini

    There needs to be a lib/config.ini file, even if it is empty.

  3. postgresql authentication error / fe_sendauth: no password supplied

    Example:

    svc=db_foo
    postgresql authentication error
    fe_sendauth: no password supplied

    The db access section named [db_foo] is either missing from the config-pvt.ini or config.ini file or it is present but the username and/or password for the database server are wrong.

  4. api requires updates xxxxxx / db has updates xxxxxx

    The version of the API (in jmdictdb/dbver.py) in different than the database version (in table "db" or more conveniently viewable in view "dbx"). If the API version is newer, you need to apply the appropriate updates from db/updates/. If older, since there is no easy way to "downgrade" a database, you’ll need to find an older database of the correct version or load a database from source files (eg JMdict XML).

  5. No entries in jmdictdb log file

    • Check the web server’s error log file. If a cgi script can’t open the error log for writing it will write a message to that effect to srderr which should appear in the web server error log file.

    • Check the config.ini file. The default log file name and location is web/lib/jmdictdb.logbut can be overridden in the config.ini file.

    • Make sure the log file is writable by the web server process.

Notes

Note 1

For more information on Postgresql usernames, passwords and the .pgpass file, see the Postgresql docs:

33.15 Client Interfaces / libpq / The Password File

https://www.postgresql.org/docs/10/libpq-pgpass.html

31.1 Client Interfaces / libpq / Database Connection - Control Functions

https://www.postgresql.org/docs/10/libpq-connect.html

20 Server Administration / Client Authentication

https://www.postgresql.org/docs/10/client-authentication.html

sec VI Reference / Postgresql Client Applications / psql / Usage / Connecting to a Database

https://www.postgresql.org/docs/10/app-psql.html#R2-APP-PSQL-CONNECTING

Note that chapter numbers may vary between Postgresql versions and these are for Postgresql version 10.