By default, spaCy carries around a powerful battery of pipelines and swings these mighty chainsaws at every passing tree and twig. Sometimes, however, you only want a small pruner to accomplish some smaller task. Can spaCy still work in such a use case? For example, suppose that all I want from spaCy are my documents…
Category: python
spaCy: The Basics
I learned much of my natural language processing using Python’s `nltk` library which, coupled with the nltk book (https://www.nltk.org/book/), provides a great introduction to the topic. When I hit industry, however, I never really found a use for it, nor motivate myself to learn the intricacies of creating a corpus from my own dataset. Many…
How Python Finds Your Imports
It seems easy. I need a package, say pandas, so I run pip install pandas. Then, at the top of my file I can get access to this library by a simple import at the top: import pandas as pd. But how does Python determine where the package is located? First, Python will check if…
string — Common string operations (Part 1: methods)
The string module provides a number of methods and constants for manipulating strings (type: str). These work on all strings in Python which are created using quotation marks: ‘single’, “double”, ”’triple-single”’, and “””triple-double”””-quoted text. Strings can also be created by applying the built-in str( ) function to any other datatype. In Python, strings are immutable…
How Virtual Environments Work (on Windows)
Brett Cannon made a short (and quite interesting post) on virtual environments and their context, though this focused on their application to a Unix-based OS rather than Windows. I’d like to summarize the content there and adapt it to Windows. History Why do we have virtual environments? This may be a perplexing question to someone…
pathlib
— Object-oriented filesystem paths
The pathlib module was introduced in Python 3.4 (see PEP-428) — or, more accurately, pathlib was a 3rd party module which was added to the standard library (i.e., the packages that come with all installs of Python — unless excluded for, e.g., including library on smaller devices). It was attempting to provide a more friendly,…
Walruses and Regular Expressions
Incorporating regular expressions has been clunky. Let’s imagine that we need to search for a few regular expressions in some text and then perform task when a term is found. In code: Unfortunately, we can’t just use the match because, if the pattern is not in text, the result is None. So, before using the…
Making Better Regular Expressions
I use a lot of regular expressions in my work. They are very powerful for extracting, replacing, or locating text strings of interest, particularly in their flexibility. Character classes, case insensitivity, etc. are very powerful. Take a simple use case: let’s find all the words (letter-only sequences) in some text: Regular expressions do have some…
Installing Python on Windows
Python 3.11.0 was just released. In honor of this, I wanted to write a quick walk through of installing Python on Windows. The process is relatively straightforward and only takes a couple minutes. In addition, I’ll provide the steps for downloading and installing Microsoft Build Tools for Visual Studio which is sometimes required for compiling…
Dynamic Regex Queries in Pandas
I encountered an issue recently in which I wanted to dynamically retain only those rows which matched a group of regular expressions, or, in some cases, to be able to exclude rows matching a particular set of regular expressions. This is relatively straightforward should the regular expressions be known in advance. Let’s begin by setting…