Skip to content

What is Named Entity Recognition (NER)?

Named Entity Recognition is a use case within Natural Language Processing where a model learns to label certain words that belong to a particular group.

What is Named Entity Recognition?

When we humans try to understand a sentence, we quickly recognize individual words that belong to a particular class, such as a location, a time, or words that identify a person. Named Entity Recognition refers to just such models that label specific words in a sentence or paragraph and assign them to the correct class.

Named Entity Recognition Example | Source: Analytics Vidhya

This information is indispensable to understanding the sentence’s content correctly and should be recognized correctly. The classification of words and sentence components is found in different stages.

What are the Challenges of NER?

The problem with Natural Language Processing is that we have all been fluent in a natural language since early childhood and understand it without thinking. This makes it all the more difficult to formulate how we recognize entities in a text. For the model, this involves overcoming some challenges that seem self-evident to us:

  • Recognizing Variants: Names, place names or company names can appear in different variants. A person can be addressed either with the full name or only with the last name. The model must recognize that both times possibly the same person is meant. The same applies to the designations “New York”, “NYC” and “New York City”, which all name the same major American city.
  • Normalization: Time or money references can appear in different formats and still mean the same thing. A NER model must also learn these differences, for example, to understand that “€10.000” and “€10,000” mean the same thing, and only in English is the comma used to separate thousands.
  • Delimitation of Entities: Finally, the delimitations between entities must also be recognized. It can happen that an entity consists of only one word, while another entity has four words in most cases.

What Levels are used for Named Entity Recognition?

If we want to train a Named Entity Recognition we need enough training data to feed the model. To get this automatically and not have to classify the entities by hand, we can use the following steps:

  1. Recognition of Nouns: Our Named Entities must be nouns, so we filter the given text so that only the nouns remain. There are already trained models for this in many languages, for example for part-of-speech tagging.
  2. Classification of Words: After we have filtered the nouns, we can classify them into the classes we want. For this, various free databases can be used to automate this step as much as possible. For example, we can query the Google Maps database via API to classify location information.

How does NER work?

In the Python modules Spacy and NLTK, you can easily load trained Named Entity Recognition models, which already work well for the standard languages. However, you may also need to train your own NER model in order to tune it better for your own use case.

Before we can start with the actual training, we need a training dataset with enough examples of texts and the entities to be found within the text. If you want to train the model on special cases, there is often no way around creating the dataset itself and naming the words or phrases by hand.

Subsequently, a so-called Conditional Random Field (CRF) can be trained for Named Entity Recognition. It is a statistical model that is particularly well suited for the recognition of schemas and also includes context information in the prediction.

Explained in simple terms, the Conditional Random Field trains logistic regressions for single sequences. The following values are used as input variables:

  • Set of input vectors
  • Position of the word to be currently predicted
  • Label of the previous word
  • Label of the current word

This can then be used to learn, for example, that verbs often follow nouns and to learn conclusions about the possible label.

What do you use NER for?

Named Entity Recognition can be used in many areas. Therefore, the following examples are only an excerpt of possible use cases:

  • Human Resources: Special models can be used to quickly extract information from applicants’ resumes.
  • Search Algorithms: In product searches, for example, product characteristics and product names can be recognized and searched for in a differentiated manner. This sharpens the search result, for example by searching for the number twelve in the search term “iPhone 12” in product titles rather than as a product property.
  • Customer Service: Inquiries from customers can also be better classified and filtered by recognizing keywords. This reduces the response time of the employee.

This is what you should take with you

  • Named Entity Recognition models learn to assign single words or sequences to a group.
  • For this purpose, so-called Conditional Random Fields are trained, which perform the classification depending on the sequence.
  • A good NER model is characterized by the recognition of variants and the good delimitation of entities.

Other Articles on the Topic of Named Entity Recognition

  • You can find more information about Named Entity Recognition here.
Cookie Consent with Real Cookie Banner