Teradata is a relational database released by Teradata Corporation. I have some data that lives there and occasionally need to access it — how to approach this with Python?
Teradata Corporation publishes the teradatasql
library to provide a PEP 249-compatible interface to the Teradata database. This Python package is actively maintained internally.
Connection
Basic usage requires installing teradatasql
(pip install teradatasql
), and having authorized credentials to a Teradata instance (requiring a host/server name, username, and password). You may need (or want) to configure a number of connection parameters, but the most important may be logmech
in which the method of logon authentication must be specified (e.g., TD2
[default] for ‘Teradata Method 2’, LDAP
, etc.). We’ll always add encryptdata=True
as well to make sure data is encrypted in transit.
# basic usage, but prefer encrypting password (see below)!
import teradatasql
conn = teradatasql.connect(None, host='hostname', user='username', password='Password1!', logmech='LDAP', encryptdata=True)
But this method of authentication has a clear problem: plain-text storing of the password. We could place a prompt for it, but this would not only becoming annoying for a regular user, but impossible for scheduled jobs.
import teradatasql
pwd = input('Password > ')
conn = teradatasql.connect(None, host='hostname', user='username', password=pwd, logmech='LDAP', encryptdata=True)
Encrypting a Password
Instead, Teradata provides a method for encrypting our password into a key file and a password file. When teradatasql
is installed, a teradatasql
folder is installed into your Python environment. This directory contains a peculiarily-named samples
directory with a number of runnable scripts. We will look at the TJEncryptPassword.py
script (here: .venv/teradatasql/samples/TJEncryptPassword.py
). This requires running the script with the following customizable parameters (there are no defaults):
# ensure in appropriate virtual environment
python TJEncryptPassword.py AES/CBC/NoPadding 256 HmacSHA256 td_key.prop td_pwd.prop <hostname> <username> <password>
# E.g.,
python TJEncryptPassword.py AES/CBC/NoPadding 256 HmacSHA256 td_key.prop td_pwd.prop hostname username Password1!
Note that some of these options are configurable. Running this command will also initiate a sample connection. Don’t be concerned if this fails! The connection defaults to logmech=TD2
and is not configurable. If you want it to be testable, don’t be afraid to manually alter this file so that the connection string correctly specifies a logmech='<YOUR LOGMECH>'
.
Alternatively, if password encryption is a regular activity for your organisation, it may be worth creating your own script (or forking and customizing existing options like teradata_encrypt
so that a user won’t have to specify so much or concern oneself about which configuration options are preferred.
This process will create two files which contain the encryption algorithm properties and an encrypted password with hash/salt.
Connecting with an Encrypted Password
Once the password is encrypted, we’ll need to alter our connection configuration.
import teradatasql
conn = teradatasql.connect(
None, host='hostname', user='username',
password='ENCRYPTED_PASSWORD(file:td_key.prop,file:td_pwd.prop)',
logmech='LDAP', encryptdata=True,
)
Be sure to specify the full path and (even if one Windows) use a forward slash as the path separator:
password = 'ENCRYPTED_PASSWORD(file:C:/Users/me/td_key.prop,file:C:/Users/me/td_pwd.prop)'
It may also make sense to keep the td_key.prop
and td_pwd.prop
in separate locations.
Usage
Now that we have a connection, how do we use it? The Teradata connection should behave like most other connecton objects in Python. You can see a number of sample programs in .venv/samples/
. When looking at these examples, note that the teradatasql
package uses the convention of beginning variables with s
for str
s and i
for int
s.
query = '''
SELECT * FROM table1 WHERE condition = 1
'''
with teradatasql.connect(
None, host='hostname', user='username',
password='ENCRYPTED_PASSWORD(file:td_key.prop,file:td_pwd.prop)',
logmech='LDAP', encryptdata=True,
) as conn:
with conn.cursor() as cur:
cur.execute(query)
for row in cur:
print(row)
We can (currently) also use this directly with pandas
, although it is not technically supported (pandas
prefers sqlalchemy
).
import pandas as pd
# entire query in one go
df = pd.read_sql_query(query, conn)
# for larger queries, consider chunking
df = pd.concat(df for df in pd.read_sql_query(query, conn, chunksize=2000))
sqlalchemy
If we wanted to use sqlalchemy
, we’ll need to look into teradatasqlalchemy
(pip install teradatasqlalchemy
), which is supported by Teradata Corporation. We’ll use the same arguments as before, but place them into a sqlalchemy
-styled connecton string:
import sqlalchemy as sa
# also: need `teradatasqlalchemy` installed
hostname = 'hostname'
username = 'username'
password = 'ENCRYPTED_PASSWORD(file:td_key.prop,file:td_pwd.prop)'
logmech = 'LDAP'
eng = sa.create_engine(f'teradatasql://{hostname}/?user={username}&password={password}&logmech={logmech}&encryptdata=true')
And when integrating with pandas
:
import pandas as pd
query = '''
SELECT * FROM table1 WHERE condition = 1
'''
# entire query in one go
df = pd.read_sql_query(query, eng)
# for larger queries, consider chunking
df = pd.concat(df for df in pd.read_sql_query(query, eng, chunksize=2000))