‘Namespaces are one honking great idea!’ Namespaces are powerful as they allow functions with the same names to exist in the same script, while also providing relatively-straightforward attribute access (pathlib.Path) and static analysis assistance. Futher, there are many times in which having access to namespace rather than, e.g., a dict could be wildly useful. Introducing:…
Category: python
Exploring `polars` Masks
When working with arrays and dataframes, a ‘mask’ is a filter that selects a subset of the source array or dataframe. This is often represented as a boolean array or Series like: [True, False, False, True]. When evaluated against a DataFrame, we’ll get the first and fourth rows back since these are both True. Since…
Reviewing Regex Matches with Context Window in `polars`
In natural language processing tasks (especially when building regular expression-based tools), it’s important to be able to review text efficiently. When I first started, the default approach was reviewing in an Excel workbook. This involved a few columns of metadata, a giant blurb of text to be reviewed, followed by a column to record the…
`polars`: `replace_strict` vs `replace`
When I was first learningn polars, I had an immediate need to replace a certain column with a mapping. This often happens in data science where a variable is stored as using a numerical representation rather than a string to save space, simplify filtering, etc. The ‘mapping’ is stored in either documentation or some sort…
Installing a Project from Github with `uv`
Here’s a short guide (targeted primarily at myself) on how to get a project from Github (or some other git-based repository) onto my machine. This guide assumes you have git and uv installed and added to your path. Also, the secret to understanding uv is seeing the lockfile as foundational rather than the packages currently…
Should `8.475` round to `8.48` or `8.47`?
If you use Python’s built-in round, the answer is easy. round(8.475, 2) in theory looks at the 5 and will therefore round to the nearest even number (i.e., 8 not 7) so the result should be 8.48. EOM. But, when using pandas I get 8.48, but polars gives me 8.47 — why the difference? First,…
Getting Started with `uv`
I started using uv as my default package manager. I’ve only ever used pip, but some echoes of uv had been reverberating in my head, so I gave it a try. And, after a couple months, I’m still using it. I’ve enjoyed the speed of installation and general dependency management, though have endured a few…
Optimizing `to_sql` Method in `pandas`
My environment requires a lot of database work in SQL Server to access data. The data I work with (i.e., text) isn’t stored particularly efficiently so I will sometimes need to pull down data, perform some manipulations, re-upload, do some joins, and download again. Sure, there are a number of shiny ‘solutions’ that would make…
Connecting to Teradata with Python
Teradata is a relational database released by Teradata Corporation. I have some data that lives there and occasionally need to access it — how to approach this with Python? Teradata Corporation publishes the teradatasql library to provide a PEP 249-compatible interface to the Teradata database. This Python package is actively maintained internally. Connection Basic usage…
`seaborn`: The Basics
seaborn is a Python graphing library which interacts incredibly well with pandas. Yes, pandas does have its own plotting functions accessible from df.plot, which are particularly easy to build and (quite conveniently) don’t require another external library. I’ve fond pandas‘ plots particularly useful to do quick checks and calculations while doing some other aspect of…