Hello and welcome! I’m Trang Le. I’m a postdoctoral fellow with Jason Moore at the Computational Genetics Lab, University of Pennsylvania. I enjoy developing machine learning methods for analyses of biomedical data, including neuroimage (functional/structural MRI), transcriptomics and genotypes. Most of the datasets I work with are high dimensional (i.e., have many predictors/features), so I spend most of my time building feature selection algorithms for these data. I trade my bias toward the nearest-neighbor concept for lower variance of my methods and better generalizability. When I’m not knee deep in data, I run, dance and seasonally ski.

Explorations

A few days after nonessential business closing due to the COVID-19 pandemic, the streets and trails of Philadelphia are filled with runners. While it’s nice that a lot of people have reverted to this basic form of exercise, Welcome to the club! it pains us regular runners physically when we see you run in jeans and cotton t-shirts. If running for you is a outdoor family activity and the goal is …

Read More…

Earlier this month, I had a blast leading a machine learning workshop at an R Ladies Philly meetup. After an introduction to machine learning, we used a beer review dataset to predict the alcohol concentration of beer using the caret R package. We even dabbled in text analysis. Everyone was awesome and asked many excellent questions throughout! I used RStudio Cloud to facilitate this workshop to …

Read More…

The Quaker Strong challenge

November 27 2019

In the last Quaker Strong challenge in the spring (March Madness edition), I was competing with several friends and enjoyed seeing how they were doing with the challenge. However, this time, with 46 registered participants, I figured it might be fun to write a few lines of code and make some fun visualization out of the logged progress. Code and details can be found here. To protect the …

Read More…

TPOT: Where do I start?

November 5 2019

Tree-based Pipeline Optimization Tool (TPOT) is an automated machine learning tool that helps the data scientist find the optimal model pipeline for their prediction problem. Using genetic programming (GP), TPOT explores different pipelines (sequences of feature selectors, model classifiers, etc.) and recommends one with optimal cross-validated score after a specified number of generations. Here …

Read More…

 

Recent Works

  • Fundamentals of AI guest lecturer, University of Pennsylvania, Mar 30, 2020  
  • Detect network interactions and control for confounders and multiple testing, Rocky Mountain Bioinformatics Conference, Dec 6, 2019      
  • Multilocus risk scores, Rocky Mountain Bioinformatics Conference, Dec 6, 2019      
  • Machine learning workshop, R Ladies Philly, Dec 2, 2019      
  • Multilocus risk scores, Penn Genetics Retreat, Sep 4, 2019      
  • Multilocus risk scores, Multilocus risk scores: rethinking genetic risk scores to account for epistasis, Feb 25, 2020      
  • npdr: Select features with nearest-neighbor concepts (2019)      
  • Trang T Le, Weixuan Fu and Jason H Moore (2019) Scaling tree-based automated machine learning to biomedical big data with a feature set selector. doi:10.1093/bioinformatics/btz470