Literature and learning from past

Scientific knowlege has been accumulated for hundreds or even thousands of years. With 'accumulation', it means scientific discoveries have been recorded and published, mostly through scientific literature. By accessing the accumulated knowledge, development of new knowledge could be efficient.

As the progress of technology, however, the velocity of knowledge extension is rapidly increasing, and it is becoming almost impossible for human researchers to comprehend even their own expert areas.

Structured databases

As accumulation of human knowledge increases, instant access to necessary pieces of knowledge becomes more and more important. Many and various structured databases have been developed, to make instant access to knowledge pieces efficient. Particularly, life science is an area with rich public databases, e.g. Entrez Gene, UniProt, PubChem.

To fully benefit from the various databases, however, it is desired that relevant entities across multiple databases are to be interlinked to each other.

Linked Data

Linked Data (LD) is emerging as a new way of data publication. LD enables relevant data pieces across multiple databases to be linked to each other through a standard protocol. It may be said that while the amount of databased data piecess increased during the development of structured databases (mostly relational databases), the linkage between the data pieces is being significantly improved thanks to the technology of LD and Semantic Web.

Compared to knowledge represented in scientific literature, however, the pieces of knowlege in structured databases or linked data often miss their contexts, e.g., experimental environments.

Linked Annotation

As contexts of individual data pieces are often represented in scientific literature where they are referenced to, finding references to the entities, in literature, and linking them to the corresponding entities is an important process which

  • restores the contexts of the entities, from a perspective of databases, and

  • indexes the contexts of literature by the entries of databases, from a perspective of literature.

Google Map vs Linked Annotation

It is conceptually similar to Google Map which links various entities and structures to 2-dimensional unstructures data (map).

Linked Annotation is to link various entities and their structures to 1-dimensional unstructured data (text).


We recognize Google map is one of the most successful public-sourcing annotation systems: users can easily create annotations (geographical annotations), and share them with anyone else.

PubAnnotation ( is an annotation repository which is developed to implement a Google map-like system for public-sourcing and publishing of annotations to literature.


BLAH is organized to develop linked literature annotation as a community effort. The BioNLP community has made substantial progress for the last decades to produce various annotations to the biomedical literature. Now it is time to put more effort to improve accessbility to the invaluable resources. Through the linked annotation effort, we belive accessibility and also productivity of annotation may be significantly improved.

The first BLAH (BLAH1)

The first BLAH was organized in February 2015. Participants collaborated to collect various annotations, and many of them were sucssfully integrated on PubAnnotation. Through it, annotations from different projects and gorups were linked to each other through normalized texts. We would call it horizontal linking of annotations.

Integration of other type of annotations resources, e.g. annotation system, also was sought, and encouraging possibility was observed.

The second BLAH (BLAH2)

We organize the second BLAH to build more linkage upon what has been done during and after BLAH1. Particularly, we would like to seek linking annotations from a more semantic side this time. We would call it vertical linking of annotations, and it will significantly improve the semantic linkage of the annotations, and also the linkage between the literature and linked data.