When working with arrays and dataframes, a ‘mask’ is a filter that selects a subset of the source array or dataframe. This is often represented as a boolean array or Series like: [True, False, False, True]. When evaluated against a DataFrame, we’ll get the first and fourth rows back since these are both True. Since…
Category: polars
Reviewing Regex Matches with Context Window in `polars`
In natural language processing tasks (especially when building regular expression-based tools), it’s important to be able to review text efficiently. When I first started, the default approach was reviewing in an Excel workbook. This involved a few columns of metadata, a giant blurb of text to be reviewed, followed by a column to record the…
`polars`: `replace_strict` vs `replace`
When I was first learningn polars, I had an immediate need to replace a certain column with a mapping. This often happens in data science where a variable is stored as using a numerical representation rather than a string to save space, simplify filtering, etc. The ‘mapping’ is stored in either documentation or some sort…