ENAMDICT/JMnedict

Japanese Proper Names Dictionary Files

Copyright (C) 2017 The Electronic Dictionary Research and Development

Introduction

The ENAMDICT/JMnedict files contain Japanese proper names; place-names, surnames, given names, company names, names of artistic and literary works, product names, etc.. These were originally included in the EDICT file, along with other non-name entries. By late 1995, the number of name entries had exceeded the others, and the file was becoming unmanageably large, so the decision was made to split it. From this split came the ENAMDICT file.

The JMnedict (Japanese Multilingual Named Entity Dictionary) was initially simply the ENAMDICT file reformatted into an XML file in UTF-8 coding. It also had a small number of names which use kanji from the JIS X 0212 character set.

In 2016 the JMnedict file was included the the online database system used for maintaining and distributing the JMdict/EDICT dictionaries, and XSLT utilities developed for extracting the original ENAMDICT format file from it.

Format

The format of the ENAMDICT file is similar to the EDICT file, and the EDICT documentation should be consulted for more information.

The names have classification codes associated with them. The codes are

s - surname (138,500)
p - place-name (99,500)
u - person name, either given or surname, as-yet unclassified (139,000) 
g - given name, as-yet not classified by sex (64,600)
f - female given name (106,300)
m - male given name (14,500)
h - full (usually family plus given) name of a particular person (30,500)
pr - product name (55)
c - company name (34)
o - organization name
st - stations (8,254)
wk - work of literature, art, film, etc.

These codes are at the front of each group of translations, e.g. "(f) Hiroko" or "(s) Tanaka".

In addition, a number of country-names are added in parentheses after place-names.

The JMnedict is structured according to its DTD, which is at the front of the file.

Updating

Originally the names file was held in text form and updated by Jim Breen. Since 2016 it has been updated via an online database system. If you wish to submit an entry, use this link and select "jmnedict" from the drop-down Corpus menu.

Downloads

The files can be downloaded from the Monash ftp site: enamdict.gz and JMnedict.xml.gz

Jim Breen
The Electronic Dictionary Research and Development Group.
December 2013
June 2014
July 2017

Information about the formal usage arrangements for JMnedict/ENAMDICT can be found on the Group's WWW page.

APPENDIX

ENAMDICT COPYRIGHT STATEMENT

In March 2000, James William Breen assigned ownership of the copyright of the dictionary files assembled, coordinated and edited by him to the The Electronic Dictionary Research and Development Group.

Information about the formal usage arrangement for ENAMDICT can be found on the Group's WWW page. (http://www.edrdg.org/)

In summary, JMnedict/ENAMDICT can be freely used provided satisfactory acknowledgement is made, and a number of other conditions are met.