Healthcare text can be challenging to work with. The transformations, simplifications, and shortcuts taken to store this data for secondary use (e.g., research) result in major problems for ultimate use. These upstream failures might strip spaces (thereby causing run-together words), remove other formatting characters (e.g., newlines and tabs), and combine what were once pretty-looking tables…
Category: nlp
Building Language Rules in SpaCy
spaCy provides a number of useful methods for exploring and creating patterns after a particular text or document has been read. To see this in action, let’s use spaCy to build some rules in the more computational linguistic side of NLP. So, for those less interested in language, forgive a brief digression into Polish. In…
Using spaCy for Sentence Splitting
By default, spaCy carries around a powerful battery of pipelines and swings these mighty chainsaws at every passing tree and twig. Sometimes, however, you only want a small pruner to accomplish some smaller task. Can spaCy still work in such a use case? For example, suppose that all I want from spaCy are my documents…