Incorporating regular expressions has been clunky. Let’s imagine that we need to search for a few regular expressions in some text and then perform task when a term is found. In code: Unfortunately, we can’t just use the match because, if the pattern is not in text, the result is None. So, before using the…
Making Better Regular Expressions
I use a lot of regular expressions in my work. They are very powerful for extracting, replacing, or locating text strings of interest, particularly in their flexibility. Character classes, case insensitivity, etc. are very powerful. Take a simple use case: let’s find all the words (letter-only sequences) in some text: Regular expressions do have some…
Installing Python on Windows
Python 3.11.0 was just released. In honor of this, I wanted to write a quick walk through of installing Python on Windows. The process is relatively straightforward and only takes a couple minutes. In addition, I’ll provide the steps for downloading and installing Microsoft Build Tools for Visual Studio which is sometimes required for compiling…
Dynamic Regex Queries in Pandas
I encountered an issue recently in which I wanted to dynamically retain only those rows which matched a group of regular expressions, or, in some cases, to be able to exclude rows matching a particular set of regular expressions. This is relatively straightforward should the regular expressions be known in advance. Let’s begin by setting…
Experiments in test-driven development for NLP?
Perhaps inspired by Brian Okken’s pytest book, I have been experimenting with a new approach to writing code. Most of my work consists of a long list of one-off scripts which serve a single purpose: moving data around, performing some relatively simple NLP operation, etc. While they will likely be run a few times (e.g.,…
Ignoring Tests with `pytest.param`
Testing applications is very important, but must be creatively exercised — perhaps we can follow the wearied expression of testing being more of an art than a science? Even packages like hypothesis still require some creative initialisation. What exactly should I test? How do I test that, and only that? Perhaps there are tests that…
Upgrading to Python 3.10
I just upgraded my Windows machines to Python 3.10.1. I’ve shied away from 3.X.0 releases ever since one of them broke something on Windows — I don’t recall the version, or the reason, and I’d assume release testing has improved so that it’s unlikely to recur, but I suppose I’ve become superstitious in my age….
Retrieve UMLS Data with API Key
The basic need I have is to convert the codes in a MEDDRA dataset to CUIs (UMLS concept unique identifiers). If there were only 10 or so, I’d look them up on the Metathesaurus manually…but I have a dataset of 155 related to COVID-19. Once I have the CUIs, I can limit the output from…
Disable New Microsoft Office “Save As” Menu
Perhaps I’m old-fashioned, or perhaps I haven’t invested enough learning how great Microsoft Office’s new(-ish) save dialog is. Typically, when I want to save something, I want to type/paste in the path I want to save my file, or click through the folders as I’m accustomed to in Windows Explorer. I can’t really figure out…
Default Values for max and min
I rely pretty heavily on Python’s min and max function when trying to take the highest or lowest values from a particular algorithm. For example, a regular expression extracts scores (these could be grades, number of pages in a book, distance run, etc.) from an input document. The document may contain multiple scores (e.g., describing…