I was modifying an program recently which uses argparse
to collect command line options in order to add an option to enable a ‘testrun’. The script begins by copying a large cohort to a server before taking several steps manipulating it. When I ran it with a new configuration file, there was a misconfigured flag, but the program took several hours before running into the misconfiguration. Thus, I wanted to add a ‘testrun’ command line option which would allow a user to ‘testrun’ the configuration file in order to ensure that everything was working. If something was broken, it would only take a couple of minutes to observe and fix the configurations (or other errors). Then, they could drop the ‘testrun’ option and run the entire dataset.
I wanted to be able to supply the script with three possible states for the ‘testrun’ parameter:
- ` `: not a testrun [
testrun = None
] --testrun
: apply the default number of records (say, 20) [testrun = 20
]--testrun=50
: apply a non-default value if, e.g., a number of records are likely to be filtered out and the test remain incomplete (or, v.v., reduce the number to, e.g., 5, to avoid a longer runtime if there are expected to be a number of matches [testrun = 50
]
My first attempt involved trying to set flag options (e.g., store_true
) with optional numbers of arguments (e.g., nargs='?'
), but these caused issues. A search on the argparse
documentation, however, suggests an alternative using the const
parameter: https://docs.python.org/3/library/argparse.html#const. From the description, const
appears to have been added to help out with exactly these edge cases.
We can create the parameter like so:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--testrun', nargs='?', const=20, type=int,
help='Run a sample (default=20) records to test configuration')
print(parser.parse_args([])) # no args
#> testrun=None
print(parser.parse_args(['--testrun'])) # flag, show default
#> testrun=20
print(parser.parse_args(['--testrun=50'])) # flag with sample specified
#> testrun=50
print(parser.parse_args(['--testrun', '5'])) # flag with sample specified
#> testrun=5
We can now handle our use case by checking for the existence of testrun
before limiting our corpus. E.g., an example in pandas
might look something like this:
import pandas as pd
df = pd.read_csv(path)
if testrun:
df = df.head(testrun)
Or, perhaps more efficiently using an iterator to avoid loading the entire dataset:
if testrun:
df = next(pd.read_csv(path, chunksize=testrun)) # only load `chunksize` lines from CSV
else:
df = pd.read_csv(path)
click
‘s version
In click
, the same should be possible with the following:
import click
@click.command()
@click.option('--testrun', is_flag=False, flag_value=20, default=None, type=int)
def run(testrun):
print(testrun)
if __name__ == '__main__':
run()