I'm Neal, a graduate student in data science at UNC Charlotte.
- Language+Semantics: This has evolved from an enduring interest in foreign languages (mostly Spanish, Japanese, Chinese-Mandarin) to dabbling in linguistics and natural language processing with tools like SpaCy. More recently I've been working with large language models, mainly working to clarify immediate practical applications, quirks, and limitations. I'm particularly intrigued by the unexplored potential of semantic spaces and vector embeddings right now, and diving into how LLMs can be used to generate ontologies or extract metadata to make information more useful.
- Automated Decision-Making: It's more than training accurate models. It's data governance and having processes to integrate your decisions into. It's designing your systems and processes to generate the data you need to evaluate and improve. It's recognizing the gap between what a model was trained to do and how it's used. It's model risk management and evaluating and explaining models in depth with tools like PiML. It's continuous monitoring of decisioning systems and the ability to fix them promptly when things go wrong. It's remembering that human oversight is always necessary. Most of my experience in automated decision-making has been in payment fraud detection, but I'm always looking to branch out.
- Interface/Experience Design: Clarity and usability. Plain language. Privacy by design. Using choice architecture and careful visualization to counter patterns of bias in human perception and decision-making. All just as relevant whether your customers are internal or external.
- Agility: Adaptation requires change, so change must be safe and easy: so implement continuous integration and delivery for everything. Systems which cannot be readily understood cannot be safetly or easily changed, so seek design simplicity and clarity in everything. Use type safety, Clean code organization, and modular, testable systems. Ensure fast failure. Employ containerization, infrastructure as code, continuous documentation. Stay focused on value: don't let your methodology or support activites subsume your objectives. Agility grows out of a Grug-brained attitude.
(Mid 2024) Model Development for Malicious URL Detection
No Phishing, a small individual project to develop and evaluate a classification model for malicious URL detection, mainly using PiML.
(Late 2024) Product Development with LLMs + RAG
MinuteMate, a group semester project to develop and implement an LLM product to improve public access to public meetings (mainly formal public meetings) via retrieval-augmented generation (RAG). It consists mainly of (1) a preprocessing pipeline which populates a vector database with useful vector-indexed chunks of text from meeting transcriptions, agendas, and minutes and (2) a chat app consisting of a backend that implements RAG-enhanced chat functionality and serves it to a public Streamlit frontend via an API. To support this, we are also deploying smaller open language models like Llama3, both locally and in the cloud, rather than relying strictly on existing model services. It is very much a work in progress.
- Ibis Flexible data manipulation in a Python context. Query numerous back-end data sources, both local and cloud, using SQL and Python together and converting between them smoothly.
- PiML A set of tools for developing ML models and evaluating them in depth.
- Vega A declarative language that illustrates a grammar of visualiazation. Try it!
- prodigy A tool for fast, efficient annotation.
- SpaCy A solid NLP toolset for Python.
- Excalidraw A lightweight tool for friendly, approachable diagrams. Try it out here!