I was modifying an program recently which uses argparse to collect command line options in order to add an option to enable a ‘testrun’. The script begins by copying a large cohort to a server before taking several steps manipulating it. When I ran it with a new configuration file, there was a misconfigured flag,…
Extracting a Table from PDF with Tabula
An email arrives with an attached PDF and a request that some multi-page embedded table be extracted into Excel. For example, the following presents a short snippet: How would you handle it? Sure, this table is relatively trivial to manually extract, but imagine a PDF continuing for several pages. Fortunately, there are several Python libraries…
Extracting a Table from a PDF with Camelot
An email arrives with an attached PDF and a request that some multi-page embedded table be extracted into Excel. For example, the following presents a short snippet: How would you handle it? Sure, this table is relatively trivial to manually extract, but imagine a PDF continuing for several pages. Fortunately, there are several Python libraries…
Running `prodigy` with Encryption and Authentication
I have a prodigy task ready for internal (not internet-wide) review and have it running on a server with the host set to 0.0.0.0, but want to keep the contents secure so that only the specified reviewer can see and interact with the review process. For context, let’s suppose that I’m working on a Windows…
Creating a Self-Signed Cert
One component of the security of a website is to ensure that client and server are communicating with each other without anyone intercepting the traffic. This is the reason for the ‘s’ tacked onto the http of your URL. On the great wide web, there are a number of certificate authorities, but my intranet lacks…
Sharing `click` Options
One nuisance with building command line options in argparse, click, or any other system is duplicates in the parameter list. When using argparse, I would call various functions to add a set of arguments to my ArgumentParser (e.g., for shared output parameters/configuration): Treat the above (and below) as pseudocode, but hopefully they gets the idea…
A Quickety `click` Tutorial
I’ve always used argparse. I’ve tried a few others, but it’s hard to be beat a built-in argument parser with power and flexibility of argparse. Recently, however, I’ve found click appearing increasingly in my requirements.txt and pyproject.toml. While I have not explored the depths of click (most of my use cases don’t involve a high…
Simplifying File Character Encodings
A recent project required me to work with a number of character encodings. And, to quote a colleague who has done more than his share of this dirty work: ‘Character sets are a b****’. Yes, they are. This particular project had free text stored in one encoding, a dependency which required input in a different…
Context Managers and the Fencepost Problem
In this write-up, I want to discuss a more encapsulated solution to the fencepost problem which relies on Python’s context managers. By ‘encapsulated’, I mean ‘hidden from the user’, or ‘handled by the object’ in an object oriented programming sense. Before starting, let’s digress briefly into the fencepost problem (at least as how I was…
zipfile — Work with ZIP archives
Python is probably not your first thought when it comes to opening zip archives or compressing directories. In fact, if you’re like me, zip means something rather different… For most needs of handling zip archives, your favourite shell or window GUI handles most of your needs. In fact, if you want Python to emulate this…