Retrieve UMLS Data with API Key

The basic need I have is to convert the codes in a MEDDRA dataset to CUIs (UMLS concept unique identifiers). If there were only 10 or so, I’d look them up on the Metathesaurus manually…but I have a dataset of 155 related to COVID-19. Once I have the CUIs, I can limit the output from Metamaplite to only those with relevant COVID-19 CUIs.

To do this programmatically, I am planning to use httpie to fire the relevant curl commands from the documentation. Well, the documentation looks a bit sparse, but I think I can make it work.

Connect to Metathesaurus home page and login. (If you don’t have an account, you need to sign up first…)
To obtain the API KEY for your account, choose “My Profile”. (This can be reset going to “Edit Profile” at the bottom of the page.

3. After activating a new virtual environment and installing httpie, I ran this command:

http --form POST https://utslogin.nlm.nih.gov/cas/v1/api-key apikey={INSERT_API_KEY_VALUE_HERE}

The response includes the TGT key (begins with TGT, ends with cas.

This is my ‘TGT’, which I should generate for each session. I then use the TGT to generate an individual ‘ST’ or ticket. It would be rather time-consuming and error prone to continue using httpie in this context, so let’s write a short script.

In a config.py, I’ll place my API_KEY and the LIST of target Meddra codes I’m interested in. I’ll also store the VERSION (setting to ‘current‘ which is the most recent UMLS version). A future refactoring should load these from the command line.

Then, we can create a main.py which I’ve based on some sample code provided by HHS (that code isn’t great — a pull request or two would be good — but it points you in the right direction).

import requests
import json
from config import API_KEY, VERSION, LIST
from lxml.html import fromstring


def get_tgt():
    """Retrieve session-based token"""
    r = requests.post(
        f'https://utslogin.nlm.nih.gov/cas/v1/api-key',
        data={'apikey': API_KEY},
        headers={'Content-type': 'application/x-www-form-urlencoded', 'Accept': 'text/plain', 'User-Agent': 'python'},
    )
    return fromstring(r.text).xpath('//form/@action')[0]


def get_ticket(tgt):
    """Retrieve request-based token"""
    r = requests.post(
        tgt,
        data={'service': 'http://umlsks.nlm.nih.gov'},
        headers={'Content-type': 'application/x-www-form-urlencoded', 'Accept': 'text/plain', 'User-Agent': 'python'},
    )
    return r.text


if __name__ == '__main__':
    tgt = get_tgt()
    cuis = []
    for string in LIST:
        ticket = get_ticket(tgt)
        # NB: 'exact' must be used when searching for 'code'
        r = requests.get(
            f'https://uts-ws.nlm.nih.gov/rest/search/{VERSION}',
            params={'string': string, 'ticket': ticket, 'inputType': 'code', 'searchType': 'exact'},
        )
        r.encoding = 'utf-8'
        json_data = json.loads(r.text)
        for result in json_data['result']['results']:
            cuis.append(result['ui'])
    print(cuis)

Here are a couple codes you can try out (the full list I’m using is in the spreadsheet here):

The output I get is the desired list of UMLS CUIs: ['C0206750', 'C5244047', 'C5244047', 'C5203670', 'C5203670', ...].

Now, I can use the list to select the text mentions of interest, and also package this up for future use.