Home | WebAPI | About Cocoa | Help | FeedBack



Cocoa Compact cover annotator for biological noun phrases

Annotations for some corpora:

We provide annotations to some corpora in a Brat-compatible ann/A1 standoff format. These annotations are viewable directly online through the excellent Brat viewer. For each corpus, you can download the A1 annotations here for viewing on your own machine; however, both the original text as well as the manual annotations have to be separately downloaded from the respective sites for the corpora. All copyrights to the source material remain vested with their original owners. The Cocoa annotations themselves are freely redistributable under a Creative Commons license.

Annotations and links:

The Cellfinder corpus.

Reference: Mariana Neves, Alexander Damaschun, Andreas Kurtz, Ulf Leser. Annotating and evaluating text for stem cell research. Third Workshop on Building and Evaluation Resources for Biomedical Text Mining (BioTxtM 2012) at Language Resources and Evaluation (LREC) 2012. (link)  

The Protein Residue Relations Silver Corpus from the BioNLP-corpora .

Reference: Ravikumar K.E., Haibin, L., Cohn, JD, Wall, M.E., Verspoor, K.M. (2011) Pattern Learning Through Distant Supervision for Extraction of Protein-Residue Associations in the Biomedical Literature. The Tenth International Conference on Machine Learning and Applications (ICMLA) 2011, Honolulu, Hawaii, USA, December, 2011.  

The Anatomical Entity Mention (AnEM) corpus.
  • Download Cocoa annotations in the Brat stand-off format, or
  • View online (soon) here or view it here (older Cocoa version)
  • Notes on evaluation of Cocoa cellular component, cell and all anatomical entity annotations against the manual annotations.

Reference: Tomoko Ohta, Sampo Pyysalo, Jun'ichi Tsujii and Sophia Ananiadou (2012). Open-domain Anatomical Entity Mention Detection. (to appear in proceedings of DSSD 2012)  

The Multi-level event extraction (MLEE) corpus.
  • Download Cocoa annotations in the Brat stand-off format, or
  • Notes on evaluation of Cocoa protein, chemical, organism, cellular component, cell and all anatomical entity annotations against the manual annotations.

Reference: Sampo Pyysalo, Tomoko Ohta, Makoto Miwa, Han-Cheol Cho, Jun'ichi Tsujii and Sophia Ananiadou (2012). Event extraction across multiple levels of biological organization. Bioinformatics. 28(18):i575-i581  

The Colorado Richly Annotated Full Text Corpus (CRAFT).
  • Download Cocoa annotations in the Brat stand-off format, or
  • View online (soon)
  • Notes on evaluation of Cocoa proteins and protein-parts against the manual annotations.
  • Notes on evaluation of Cocoa cellular component, cell and organism annotations against the manual annotations.

Reference: Bada, M.*, Eckert, M., Evans, D., Garcia, K., Shipley, K., Sitnikov, D., Baumgartner Jr., W. A., Cohen, K. B., Verspoor, K., Blake, J. A., and Hunter, L. E. Concept Annotation in the CRAFT Corpus. Accepted into BMC Bioinformatics.

Reference: Verspoor, K.*, Cohen, K.B.*, Lanfranchi, A., Warner, C., Johnson, H.L., Roeder, C., Choi, J.D., Funk, C., Malenkiy, Y., Eckert, M., Xue, N., Baumgartner Jr., W.A., Bada, M., Palmer, M., Hunter L.E. A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools. BMC Bioinformatics, in press.  

The Arizona Disease corpus is a corpus that is annotated for disease names and normalized against UMLS.

  • Download Cocoa annotations in the Brat stand-off format, or
  • View online (soon)
  • Download a script to convert the corpus to a Brat/A1 compatble format.
  • Notes on evaluation of Cocoa disease annotations against the manual annotations.
Reference: Leaman, R., Miller, C., Gonzalez, G. (2009). Enabling Recognition of Diseases in Biomedical Text with Machine Learning: Corpus and Benchmark. Symposium on Languages in Biology and Medicine, 82-89.


Home | Terms of Service | Contact
©2012 - NPjoint