Using Cocoa with the Turku event extraction system
The Turku Event Extraction System (TEES)
is "a free and open source natural language processing system developed for the extraction of events and relations from biomedical text". The TEES system uses the BANNER
tool (by default) to annotate protein names in raw text prior to event extraction.
As discussed in our evaluation of Cocoa against the CRAFT corpus, both Cocoa and BANNER perform similarly in the NER task for proteins, with precision being higher in Cocoa while recall is better in BANNER. It is likely that Cocoa will do better than BANNER (at least) for some biomedical subdomains. We therefore give below a procedure that allows the TEES system to use Cocoa instead of BANNER for the named entity recognition (NER) task.
This procedure involves:
- Adding a line in the Preprocessor.py file to shift the onus of the NER task to Cocoa (from BANNER). This file lives in the "Detectors" subdirectory in TEES. Back up your original file (Preprocessor.py) before proceeding.
- Copying a file "Cocoa.py" to actually call the Cocoa WebAPI. This file should be placed in the "Tools" subdirectory of the TEES.
Download both files as a zip archive here
. Unzip this archive, and
- copy the Preprocessor.py file to the "Detectors" subdirectory (after backing up the original)
- copy/move the Cocoa.py file to the "Tools" subdirectory.
That should be all. Run the system against a Pubmed article to check that Cocoa is being called for the NER task:
python classify.py -m GE -i 9668063 -o OUTSTEM
as suggested in the TEES Quick start guide
The "Cocoa.py" file is basically the "Banner.py" file with some small modifications. We dont claim any credit or ownership for this file; copyright and all other rights to the "Cocoa.py" file are governed by the same license as the rest of the TEES system..