This blog runs on WordPress, and, despite the hours I’ve invested in generating this great content, the nearly one thousand ‘visitors’ to my website each day seem to really like my sign-in page: https://SITE/wp-admin. And they like to experiment with random username and password combinations… I didn’t contribute at all to that page, and so…
Moving Large Program Data to Another Drive with Links
I regularly run a program (on Windows) that downloads and indexes quite a bit of data. While this makes the application run quite smoothly, it does take up a lot of disk space. Recently, a change modified the save location for this data from my spacious D:\ to the rather more cramped conditions of C:\Users\foggy\AppData\Local….
Editing On GitHub
I’ll sometimes edit directly on GitHub. It’s probably a package I’m using and I need to remove or add a line or two. For example, the prompt for this posting was to remove a line that added a log file globally using loguru. It seemed a good idea at the time, when the file was…
Walruses and Regular Expressions
Incorporating regular expressions has been clunky. Let’s imagine that we need to search for a few regular expressions in some text and then perform task when a term is found. In code: Unfortunately, we can’t just use the match because, if the pattern is not in text, the result is None. So, before using the…
Making Better Regular Expressions
I use a lot of regular expressions in my work. They are very powerful for extracting, replacing, or locating text strings of interest, particularly in their flexibility. Character classes, case insensitivity, etc. are very powerful. Take a simple use case: let’s find all the words (letter-only sequences) in some text: Regular expressions do have some…
Installing Python on Windows
Python 3.11.0 was just released. In honor of this, I wanted to write a quick walk through of installing Python on Windows. The process is relatively straightforward and only takes a couple minutes. In addition, I’ll provide the steps for downloading and installing Microsoft Build Tools for Visual Studio which is sometimes required for compiling…
Dynamic Regex Queries in Pandas
I encountered an issue recently in which I wanted to dynamically retain only those rows which matched a group of regular expressions, or, in some cases, to be able to exclude rows matching a particular set of regular expressions. This is relatively straightforward should the regular expressions be known in advance. Let’s begin by setting…
Experiments in test-driven development for NLP?
Perhaps inspired by Brian Okken’s pytest book, I have been experimenting with a new approach to writing code. Most of my work consists of a long list of one-off scripts which serve a single purpose: moving data around, performing some relatively simple NLP operation, etc. While they will likely be run a few times (e.g.,…
Ignoring Tests with `pytest.param`
Testing applications is very important, but must be creatively exercised — perhaps we can follow the wearied expression of testing being more of an art than a science? Even packages like hypothesis still require some creative initialisation. What exactly should I test? How do I test that, and only that? Perhaps there are tests that…
Upgrading to Python 3.10
I just upgraded my Windows machines to Python 3.10.1. I’ve shied away from 3.X.0 releases ever since one of them broke something on Windows — I don’t recall the version, or the reason, and I’d assume release testing has improved so that it’s unlikely to recur, but I suppose I’ve become superstitious in my age….