I am one of the primary managers of my company’s public github site. While we generate a lot of code, most of this work is not designed to be shareable (run once or twice for a particular project and then move on to the next project) as the focus is on the research work. For…
Category: Uncategorized
Coding with a Chatbot for Dummies
My first attempt to code with a chatbot was several years ago and involved using ChatGPT to do a couple data transformations using pandas. The dataset was not large. My procedure was something like: Regarding #3, I think I stared by trying to type it out myself — the idea of making sure I understood…
Evaluating Generative Chatbots
I was at an epidemiology conference about a month ago – not a typical location for a data scientist, but circumstances found me there. A number of sessions have embraced a certain (albeit nervous) enthusiasm regarding access to decoder-only transformers, often called ‘AI’ or ‘large language models’. There is a certain buzz and excitement —…
Calculating Jaccard Similarity Coefficients in `pandas`
I’m quite accustomed to looking at performance against some gold (or silver) standard. It’s nice to have some ready definition of ‘truth’ and then, when applying some algorithm, we can clearly see if it matched or failed to match. More recently, however, I was attempting to compare the outputs of multiple UMLS-processing NLP systems on…