PyTorch NER Model: Step-by-Step

Developing a Neural NER Model in Pytorch: A Step-by-Step Guide

Task

Question:

Named Entity Recognition (NER) is an important information extraction task that requires to identify and classify named entities in a given text. These entity types are usually predefined like location, organization, person, and time. In this exercise, you will learn how to develop a neural NER model in Pytorch. In particular, you will learn

(a) How to prepare data (input and output) for developing a NER model.

(b) How to design and fit/train a neural NER model.

(c) How to use the trained NER model to predict named entity types for a new text.

You can use this code as your codebase and build on top of it. This code implements the CNN- LSTM-CRF NER model described in [1]. The model has an architecture as shown in Figure 1.

The dataset for this exercise will be the standard CoNLL NER dataset, which is also available in the codebase (inside ‘data’ directory). Please do the following:

(i) Download the code repository. The dataset should have three files: eng.train, eng.testb, and eng.testa to be respectively used for training, testing, and validation. The dataset contains four different types of named entities: PERSON, LOCATION, ORGANIZATION, and MISC; it uses the BIO tagging scheme introduced in the lecture. Try to be familiar with the data and the BIO tagging scheme used in the data.

(ii) Run the code and see the results. The code performs basic prepossessing steps to generate tag mapping, word mapping and character
mapping that you should use. You should understand the preprocessing and data laoding functions.

(iii) Go through the model part (get_lstm_features (..), class BiLSTM_CRF(nn.Module)). Notice that it implements a charater-level encoder with a convolutional neural network (CNN) and an LSTM recurrent neural network. The default setting uses a CNN for character-level encoding. Please read these implementations carefully.

(iv) The code implements a word-level encoder with an LSTM network. In its default setting, the output layer of the network has a CRF layer. You can change it to a regular Softmax layer. For now, leave it as it is (i.e., use CRF). Your job is to replace the LSTM-based word-level encoder with a CNN layer (convolutional layer followed by an optional max pooling layer). The CNN layer should have the same output dimensions (out_channels) as the LSTM.

(v) Report the testset results when you use only one such CNN layer in your network. Report results when you use an LSTM-based character-level encoder. In each case, report the number of parameters in your model.

(vi) Now increase the number of CNN layers in your network and see the impact on your results. In each case, report the results on the testset (use the validation set to select the best model).