nlp – Foggy Programmer

Visualising Word Embeddings: Exploring Tensorflow’s Embedding Projector

Posted on October 18, 2025October 21, 2025 by Foggy Programmer

One of my regular tasks in presentations is to dedicate a couple slides to introduce word embeddings. Words are, unfortunately, arbitrary in their spelling (and, relatedly, their pronunciation). For example, if we were to forget our knowledge of English and glance at the English words rock, sock, and rook, we might assume that they are…

Reviewing Regex Matches with Context Window in `polars`

Posted on October 3, 2025September 26, 2025 by Foggy Programmer

In natural language processing tasks (especially when building regular expression-based tools), it’s important to be able to review text efficiently. When I first started, the default approach was reviewing in an Excel workbook. This involved a few columns of metadata, a giant blurb of text to be reviewed, followed by a column to record the…

Fixing Healthcare Text for NLP: Spell Correction and Word Segmentation

Posted on October 11, 2024October 11, 2024 by Foggy Programmer

Healthcare text can be challenging to work with. The transformations, simplifications, and shortcuts taken to store this data for secondary use (e.g., research) result in major problems for ultimate use. These upstream failures might strip spaces (thereby causing run-together words), remove other formatting characters (e.g., newlines and tabs), and combine what were once pretty-looking tables…

Building Language Rules in SpaCy

Posted on April 22, 2023April 1, 2023 by Foggy Programmer

spaCy provides a number of useful methods for exploring and creating patterns after a particular text or document has been read. To see this in action, let’s use spaCy to build some rules in the more computational linguistic side of NLP. So, for those less interested in language, forgive a brief digression into Polish. In…

Using spaCy for Sentence Splitting

Posted on April 15, 2023April 1, 2023 by Foggy Programmer

By default, spaCy carries around a powerful battery of pipelines and swings these mighty chainsaws at every passing tree and twig. Sometimes, however, you only want a small pruner to accomplish some smaller task. Can spaCy still work in such a use case? For example, suppose that all I want from spaCy are my documents…