Cocoa Compact cover annotator for biological noun phrases

The extended annotation set also provides fine grained subcategorization of anatomical entities, physiological conditions and organisms. Extended annotation also marks up nested entities, such as liver in the disease term liver cancer or glucose-6-phosphate as chemical in the protein mention glucose-6-phosphate dehydrogenase. Extended annotation tags are currently available only through a Web API call (see below).

The anatomical subcategories generally follow the guidelines in [Ref 1]. Specifically, Cocoa has the following divisions: GO cellular component, Cell, Tissue, Multi Tissue structure, Organ, Organism sub division, Developing anatomy, and Pathological formations. The tag names mirror the phrases above, with underscores replacing spaces, thus: 'Pathological_formation' for 'Pathological formation'. Anatomical subcategorization is a work in progress and uncategorized items are still tagged as 'Body part'. The 'GO cellular component' and the 'Cell' subcategories function somewhat reliably at present (see evaluation results against some corpora).

Physiological conditions are also subcategorized thusly:

Again, a work in progress. The corresponding tags are 'Disease', 'Bio_Process', and 'Disease_Process'. Uncategorized items are still tagged as 'Physiology'.

Organisms are divide into 3 subcategories, namely

with tags of 'Organism', 'Organism1', and 'Organism2'.

Extended annotations are available only through the Web API. Set the 'outputFormat' parameter to 'b1':

$ curl -d "outputFormat=b1&apikey=1234&text=A smorgasbord: Liver cancer, chromatophores and tigers." /Cocoa/api/
T1      Disease 15 27        Liver cancer
T2      Cellular_component 29 43        chromatophores
T3      Organism1 48 54 tigers
T4      Organ 15 20     Liver

Reference 1: Tomoko Ohta, Sampo Pyysalo, Jun'ichi Tsujii and Sophia Ananiadou (2012). Open-domain Anatomical Entity Mention Detection. (to appear in proceedings of DSSD 2012)

