Abstract
This paper discusses modifications made to the Natural Language Toolkit, a well-known natural language processing software package, to achieve improved information extraction results when applied to helicopter maintenance records. In doing so, it will also attempt to elaborate the components of a tool under development to allow for machine analysis of the free text fields of V-22 Osprey maintenance records. The authors have found that by adapting existing natural language processing software to suit peculiarities of the language found in maintenance records, substantive improvements can be made in the accuracy of information extraction. In particular, by modifying an existing text pre-processor to 1) take in multiple sentence inputs, 2) treat all code tokens as the same, and 3) ignore distinctions in punctuation, part-of-speech tagging accuracy has improved from 92.49% to 96.59%; subsequently, entity chunking precision has improved from 91.5% to 92.3%.
