In natural language processing tasks (especially when building regular expression-based tools), it’s important to be able to review text efficiently. When I first started, the default approach was reviewing in an Excel workbook. This involved a few columns of metadata, a giant blurb of text to be reviewed, followed by a column to record the…
Coding with a Chatbot for Dummies
My first attempt to code with a chatbot was several years ago and involved using ChatGPT to do a couple data transformations using pandas. The dataset was not large. My procedure was something like: Regarding #3, I think I stared by trying to type it out myself — the idea of making sure I understood…
Evaluating Generative Chatbots
I was at an epidemiology conference about a month ago – not a typical location for a data scientist, but circumstances found me there. A number of sessions have embraced a certain (albeit nervous) enthusiasm regarding access to decoder-only transformers, often called ‘AI’ or ‘large language models’. There is a certain buzz and excitement —…
`polars`: `replace_strict` vs `replace`
When I was first learningn polars, I had an immediate need to replace a certain column with a mapping. This often happens in data science where a variable is stored as using a numerical representation rather than a string to save space, simplify filtering, etc. The ‘mapping’ is stored in either documentation or some sort…
The Journey from ‘Getting Started’ to Expert
One of the challenges when picking up a new programming tool or package is moving from the very basic ‘Getting Started’ page to the vast array of API documentation. The middle ground is immense and disorienting. It takes effort and persistance to advance — to actually learn the technology. The cognitive load is heavy and…
Installing a Project from Github with `uv`
Here’s a short guide (targeted primarily at myself) on how to get a project from Github (or some other git-based repository) onto my machine. This guide assumes you have git and uv installed and added to your path. Also, the secret to understanding uv is seeing the lockfile as foundational rather than the packages currently…
Starting `polars`: A New Paradigm for Learning Technologies
I’ve been using polars a lot more, recently. This was a library I’d intended to learn a while ago, but it’s very difficult to start using (i.e., learning) a new package for a project, particularly as deadlines begin to loom. My clients don’t really care whether I use polars or pandas, so long as I…
Should `8.475` round to `8.48` or `8.47`?
If you use Python’s built-in round, the answer is easy. round(8.475, 2) in theory looks at the 5 and will therefore round to the nearest even number (i.e., 8 not 7) so the result should be 8.48. EOM. But, when using pandas I get 8.48, but polars gives me 8.47 — why the difference? First,…
How to Determine the Flags of a Compiled Regular Expresson?
I recently had the challenge of determining which flags had been set in a compiled regular expression. In other words, write a function that given a compile regular expression (e.g., re.compile(‘test’, re.I | re.M), determine that the flags were re.I and re.M. A first attempt might assume that the class re.Pattern has a flags attribute…
Getting Started with `uv`
I started using uv as my default package manager. I’ve only ever used pip, but some echoes of uv had been reverberating in my head, so I gave it a try. And, after a couple months, I’m still using it. I’ve enjoyed the speed of installation and general dependency management, though have endured a few…