If you use Python’s built-in round, the answer is easy. round(8.475, 2) in theory looks at the 5 and will therefore round to the nearest even number (i.e., 8 not 7) so the result should be 8.48. EOM. But, when using pandas I get 8.48, but polars gives me 8.47 — why the difference? First,…
Category: python
Getting Started with `uv`
I started using uv as my default package manager. I’ve only ever used pip, but some echoes of uv had been reverberating in my head, so I gave it a try. And, after a couple months, I’m still using it. I’ve enjoyed the speed of installation and general dependency management, though have endured a few…
Optimizing `to_sql` Method in `pandas`
My environment requires a lot of database work in SQL Server to access data. The data I work with (i.e., text) isn’t stored particularly efficiently so I will sometimes need to pull down data, perform some manipulations, re-upload, do some joins, and download again. Sure, there are a number of shiny ‘solutions’ that would make…
Connecting to Teradata with Python
Teradata is a relational database released by Teradata Corporation. I have some data that lives there and occasionally need to access it — how to approach this with Python? Teradata Corporation publishes the teradatasql library to provide a PEP 249-compatible interface to the Teradata database. This Python package is actively maintained internally. Connection Basic usage…
`seaborn`: The Basics
seaborn is a Python graphing library which interacts incredibly well with pandas. Yes, pandas does have its own plotting functions accessible from df.plot, which are particularly easy to build and (quite conveniently) don’t require another external library. I’ve fond pandas‘ plots particularly useful to do quick checks and calculations while doing some other aspect of…
Logging Function Parameters with `loguru`
Log files can often be useful sources of historical information about how programs run. I have found them sitting next to datasets and used them to get more information on the provenance of the dataset. Perhaps I could add a function that would log all of the parameters that were run? Sure, a configuration file…
`argparse`: Optional Argument and Flag?
I was modifying an program recently which uses argparse to collect command line options in order to add an option to enable a ‘testrun’. The script begins by copying a large cohort to a server before taking several steps manipulating it. When I ran it with a new configuration file, there was a misconfigured flag,…
Extracting a Table from PDF with Tabula
An email arrives with an attached PDF and a request that some multi-page embedded table be extracted into Excel. For example, the following presents a short snippet: How would you handle it? Sure, this table is relatively trivial to manually extract, but imagine a PDF continuing for several pages. Fortunately, there are several Python libraries…
Extracting a Table from a PDF with Camelot
An email arrives with an attached PDF and a request that some multi-page embedded table be extracted into Excel. For example, the following presents a short snippet: How would you handle it? Sure, this table is relatively trivial to manually extract, but imagine a PDF continuing for several pages. Fortunately, there are several Python libraries…
Sharing `click` Options
One nuisance with building command line options in argparse, click, or any other system is duplicates in the parameter list. When using argparse, I would call various functions to add a set of arguments to my ArgumentParser (e.g., for shared output parameters/configuration): Treat the above (and below) as pseudocode, but hopefully they gets the idea…