Hello and welcome! I’m Trang Le. I’m a postdoctoral fellow with Jason Moore at the Computational Genetics Lab, University of Pennsylvania. I enjoy developing machine learning methods for analyses of biomedical data, including neuroimage (functional/structural MRI), transcriptomics and genotypes. Most of the datasets I work with are high dimensional (i.e., have many predictors/features), so I spend most of my time building feature selection algorithms for these data. I trade my bias toward the nearest-neighbor concept for lower variance of my methods and better generalizability. When I’m not knee deep in data, I run, dance and seasonally ski.

## Explorations

### TPOT: Where do I start?

#### November 5 2019

Tree-based Pipeline Optimization Tool (TPOT) is an automated machine learning tool that helps the data scientist find the optimal model pipeline for their prediction problem. Using genetic programming (GP), TPOT explores different pipelines (sequences of feature selectors, model classifiers, etc.) and recommends one with optimal cross-validated score after a specified number of generations. Here …

### #Openscience: impact of scientific research through open resources

#### October 31 2019

Last week, I got to attend a series of presentations followed by a panel discussion on open science at Penn Van Pelt-Dietrich library during the Open Access week. The panel featured (from left to right) Ted Satterthwaite, Jennifer Sisto, Daniel Himmelstein – all initiated enlightening discussions around different aspects of open science. ⊕ Photo credit: Rebecca Miller Jennifer Stiso …

### To age or not to age

#### October 1 2019

I recently stumbled upon this article by Gervasio Piñeiroa and colleages analyzing the method of model evaluation via plotting observed and predicted $$y$$. The authors argue that, in plotting predicted or observed values, observed should be place on the $$y$$-axis vs. predicted on the $$x$$-axis. Because this article is unfortunately behind paywall, I’m going to show the quick simulation I have …

### Penn Big Data: Opportunities and challenges in health science applications

#### September 25 2019

Earlier this week, I attended the first Penn conference on big data in population health sciences. This was my first conference where I got to attend all the talks, and it was gratifying. The organizers did a wonderful job on selecting a great breadth of topics to cover, from electronic databases and biobanks to digital and mobile health. I learned so much! Some of these topics, according to my …

## Recent Works

• Multilocus risk scores, Penn Genetics Retreat, Sep 4, 2019
• npdr: Select features with nearest-neighbor concepts (2019)
• Trang T Le, Weixuan Fu and Jason H Moore (2019) Scaling tree-based automated machine learning to biomedical big data with a feature set selector. doi:10.1093/bioinformatics/btz470
• TPOT: Overview and live demonstration, Clinical Research Informatics Core, University of Pennsylvania, Mar 13, 2019
• Trang T Lê, Zach Osman, D K Watson, Martin Dunn and B A McKinney (2019) Generalization of the Fermi pseudopotential. doi:10.1088/1402-4896/ab0811
• STIR feature selection, Pacific Symposium on Biocomputing, Jan 5, 2019
• Scalable automated machine learning, AI Therapeutics, Dec 21, 2018
• Trang T Le, Weixuan Fu and Jason H Moore (2018, preprint) Scaling tree-based automated machine learning to biomedical big data with a dataset selector. doi:10.1101/502484
• Statistical Inference Relief (STIR) feature selection, Mid-Atlantic Bioinformatics Conference, Oct 29, 2018
• Trang T Le, Ryan J Urbanowicz, Jason H Moore and Brett A McKinney (2018) STatistical Inference Relief (STIR) feature selection. doi:10.1093/bioinformatics/bty788