Building a Baseline for Entity Linking Without Training Any Model

In this article we will show you a very simple method to build a baseline for the Entity Linking task. Entity Linking is a very challenging task and it is very important in both the academic and the industrial sector.


For Entity Linking, the input is a text given as a sequence of words from a dictionary. The output of an Entity Linking model is a list of mention — entity pairs , where each mention is a word subsequence of the input text and each entity is an entry in a Knowledge Base (e.g. Wikipedia).

End-to-End Neural Entity Linking, Nikolaos Kolitsas, Octavian-Eugen Ganea.

Note: some authors define the Entity Linking task including in the input also the mentions to be associated to a Knowledge Base reducing the complexity of the task.

The baseline will not require any training data. To show its performance we will apply it to a recently created dataset used in the Entity Linking research community.

The baseline will be constructed keeping in mind the new directions of the research in this area. Recent works have proved that Cross-Encoders and Bi-Encoders are very powerfull in the Entity Linking subtasks of Candidate Entity Generation and Candidate Entity Ranking.

Bi-Encoder, credit:
Cross-Encoder, credit:

They allowed to reach new state of arts in different Entity Linking datasets. For example, in Scalable Zero-shot Entity Linking with Dense Entity Retrieval, Ledell Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, Luke Zettlemoyer, a Bi-Encoder was used for the Candidate Entity Generation task and a Cross-Encoder for the Candidate Entity Ranking task obtaining impressive results in Zero-Shot contexts.

For these reasons a pre trained Cross-Encoder will be used for the Candidate Entity Ranking task allowing the baseline to be easly optimized on a specific dataset to obtain state of art results.

Entity Linking is generally divided in five subtaks: Mention Detection, Candidate Entity Generation, Candidate Entity Ranking, Unlinkable Mention Prediction.

This is not the only possible division of the Entity Linking task, there are different articles which try to solve the task using other divisions or merging parts of the previously cited subtasks.

In the Mention Detection task, given a text, we want to identify all the possible subtexts that we are interested in linking against a Knowledge Base, each of these subtexts is called a Candidate Mention. In the Candidate Entity Generation task, for each Candidate Mention, we extract a set of Candidate Entities that are candidates to be linked to the Candidate Mention. In the Candidate Entity Ranking task, for each Candidate Mention, we rank all of its associated Candidate Entities based on a similarity measure that will put on top those candidates that are most likely to be linked to the Candidate Mention. Generally (and finally) the top ranked Candidate Entity and the Candidate Mention are feed to a classifier to detect if the Candidate Mention is linkable to the Candidate Entity or not.

We will not go in the details of each of these subtasks for a matter of time and because this is not a review, if you want to read more about it the are different reviews available online.

In this article we will deploy a solution for the two subtasks: Candidate Entity Generation and Candidate Entity Ranking.

We will assume that we have a Knowledge Base consisting of a set of entities, each with an id, a title and a description, logically we will consider the case that two entities can not have the same title and description jointly. In this context the Knowledge Base can be represented simply by a table with three columns:

Id: 000523A4D586C293; 
Title: Warner
Description: Warner Warner was a communications technician aboard Nerva Beacon ...
Id: 0009247003C7CB16
Title: Winnie Tyler
Description: Winnie Tyler Winnie Tyler was Jacob Tyler ' s wife ...

Our dataset will contain a Knowledge Base, as described before, and a set of texts containing mentions to the entities described in the Knowledge Base. We will assume that each mention can be linked to an entity in the Knowledge Base avoiding the difficulties of unlinkable mentions.

Few enemies can use the spell , being <Leviathan> , Zeromus EG , Golbez , Dark Elf , Mindy , and Shiva . On the " Easy Type " version , the casting time was reduce to 2 and the spell was renamed to Blizzard 2 . " Final Fantasy IV - Interlude - .Mention: <Leviathan>
Entity: 0005...

The difficulties of the task are that the mention of an entity can be very different to its title and that one mention can be similar to several titles making disambiguation necessary to identify the correct one.

In most works, the Candidate Entity Generation task doesn’t consider the disambiguation problem, it only compares the mention to the titles of the entities in the Knowledge Base extraing those entities which have a title “similar” to the mention. The disambiguation problem is instead generally addressed in the Candidate Entity Ranking task, the mention, with its context (a window around the mention’s text), is compared to the descriptions of the candidate entities in order to decide which is most “similar” to the mention.

We will use the Wikia Dataset defined in the article: Zero-Shot Entity Linking by Reading Entity Descriptions, Ledell Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, Luke Zettlemoyer.

The dataset is provided by the authors of the article in Github: For more information about the dataset we suggest you to read the article, it is very well written.

Because we will not train any model, we will apply the baseline to the validation set in order to compute its performances and to compare them with the performances of the baselines presented in the cited article.

The validation set is composed of four subdatasets, each of one corresponding to a different Wikia. To measure the performances of their models they computed the Accuracy on each subdataset and then merged them making their average (Macro Average). They mostly used the Normalized Accuracy, basically it is the accuracy calculated on the subset of instances for which the gold entity is among the top-k candidates retrieved during the Candidate Generation step. They used this metric because they were more interested in comparing the methods in the disambiguation task and they used for all the methods the same Candidate Generation model. They used the BM25 model for the Candidate Generation step, obtaining a Recall of 76% on the Validation set.

Because we are using a different Candidate Generation model then them, we are more interested in the Unnormalized Macro Average Accuracies.

The Unnormalized Accuracy is the simple accuracy, it doesn’t filter out the errors made by the Candidate Generation model counting them in the accuracy estimation.

Fortunatly, if an Unnormalized Acurracy is greater than a Normalized Accuracy than its Normalized version is also greater.

Theorically, if they had used the Micro Average instead of the Macro Average, to aggregate the results obtained on the different subdataset, then it would have been possible to obtain the Unnormalized Accuracy from the Normalized Accuracy knowing the Recall of the Candidate Generation step.

To approximate (overestimating) the Unnormalized Macro Average Accuracy from the Normalized Macro Average Accuracy we can also guess the minimum recall obtained for the different subdatasets. The reported Recall is 76%, we can assume that the minimum is 80% to obtain an overestimation.

Normalized Macro Average Accuracies of the baselines (Table 4 in the paper)

Edit-distance: 16.49
TF-IDF: 26.06
Ganea and Hofmann (2017): 26.96
Gupta et al. (2017): 27.03

Approximated Unnormalized Macro Average Accuracies of the baselines:

Edit-distance: 16.49*0.80 = 13.192
TF-IDF: 26.06*0.80 = 20.848
Ganea and Hofmann (2017): 26.96*0.80 = 21.568
Gupta et al. (2017): 27.03*0.80 = 21.624

In any case, the baseline that we will construct obtains Unnormalized Macro Average Accuracies grater than both the Normalized Macro Average Accuracies and the Approximated Unnormalized Macro Average Accuracies of these baselines.

Our method of Candidate Entity Generation is very very simple. We decided to keep it simple because in most of the Industrial cases in which we are working on the mention is generally very similar to the titles written in our Knowledge Bases.

For each mention, we consider as Candidate Entities all those entities which have a title which starts with the mention’s text. If the mention’s text end with an ‘s’ and has a length grater than 1, we remove the last character (to easly consider the plurars). Before the comparisons, all the texts are lowercased.

Note: this method is very simple but it works in most of the cases in which we are working on. In some cases it has been necesssarly to costruct a more sofisticated method, but… this is a baseline :-)

For each mention we could have a set of Candidate Entities, obtained from the previous method, with a cardinality greater than 1, we need to rank them in order to decide which is the most likely.

For this task, initially, we wanted to use a state of art pretrained model easly integrable in our baseline (few line of codes) with high performance in Zero Shot contexts (we don’t want to finetune our baseline), unfortunately we didn’t find anything which meet our criteria.

Fortunately we had an idea, in some way this task is very similar to the task of Passage Re-Ranking: given a query and a set of texts, order them based on their importance with respect to the query. A very famous and usefull library for sentence embedding (SentenceTransformers,, provides a set of models pretrained on a massive dataset for Passage Re-Ranking: MS Marco ( This dataset is so big and so diverse that we thougth that the performances of the models trained on this dataset could, without fine tuning, obtain decent results on the Candidate Entity Ranking task.

The model used is: sentence-transformers/ce-ms-marco-electra-base.

For each pairs of mention context and candidate entity description we consider the first one as a query and the second one as a passage, we compute the similarity of these two texts and we rank the candidates with respect to this metric.

The results are obtained measuring the Unnormalized Macro Average Accuracy on the Validation set of the Wikia Dataset. The window’s size for the mention’s context is 10 tokens left and 10 tokens right centered on the mention’s text using a whitespace Tokenizer.

Unnormalized Macro Average Accuracy: 0.295

As explained in the Dataset section, we didn’t find the Unnormalized Macro Average Accuracy for the baselines used in the article which introduced the dataset, only the Normalized Macro Average Acurracy were available. They are obviously (for definition) grater than their Unnormalized counterpart, fortunately we obtain results better than the Normalized versions :-)

The presented method is very simple but it allows to obtain performances better than the baselines generally used for the Entity Linking task.

A more sophisticated solution for the Entity Generation task, maybe using a pretrained Bi-Encoder or something else (in SentenceTransformers are available Bi-Encoders pretrained on MS Marco).

An application of this baseline to a possible industrial use case.

A sharable code in GitHub.

SentenceTransformers,, a Python framework for state-of-the-art sentence and text embeddings.

Kolitsas, N., Ganea, O.E. and Hofmann, T. (2018). End-to-End Neural Entity Linking. arXiv preprint arXiv:1808.07699.

Logeswaran, L., Chang, M. W., Lee, K., Toutanova, K., Devlin, J., & Lee, H. (2019). Zero-Shot Entity Linking by Reading Entity Descriptions. arXiv preprint arXiv:1906.07348.

Wu, L., Petroni, F., Josifoski, M., Riedel, S., & Zettlemoyer, L. (2019). Scalable Zero-shot Entity Linking with Dense Entity Retrieval. arXiv preprint arXiv:1911.03814.

Bajaj, P., Campos, D., Craswell, N., Deng, L., Gao, J., Liu, X., Majumder, R., McNamara, A., Mitra, B., Nguyen, T. and Rosenberg, M. (2016). MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. arXiv preprint arXiv:1611.09268.

Shen, W., Wang, J., & Han, J. (2014). Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions. IEEE Transactions on Knowledge and Data Engineering, 27(2), 443–460.

The Story has been written by:

Feel free to add me on Linkedin:

In collaboration with:

Data Scientist