Document level relation extraction for knowledge graph generation
Title: Document level relation extraction for knowledge graph generation
SNIC Project: LiU-gpu-2021-4
Project Type: LiU Compute
Principal Investigator: Eva Blomqvist <>
Affiliation: Linköpings universitet
Duration: 2021-05-27 – 2022-06-01
Classification: 10208


This project seeks to explore the automatic generation of knowledge graphs (KGs) from text by exploiting data and models available for the highly related task of relation extraction (RE) at the document level. For our experiments, we have chosen the Document-level Relation Extraction Dataset (DocRED). This dataset defines a Natural Language Processing task where a model (usually based on a transformer architecture, e.g. BERT, RoBERTa, etc.) is supposed to output a series of facts extracted from a given document. These facts take the form of subject-predicate-object triples, where the subject and object are two entities from the document, and the predicate is some relation that holds between those entities. In particular, we wish to investigate the following three questions. First, what are the current pitfalls and drawbacks of readily available state-of-the-art systems, both from an architectural point of view and with regards to the quality of their output for the DocRED benchmark? Second, to what extent can we exploit the output of these RE systems to generate useful KGs without modifications to the underlying systems? Finally, we want to use these results to enumerate directions for future work, such as detecting particular classes of errors common between the systems which need addressing, or how new techniques such as debiasing with counterfactual reasoning can be applied.