I use a number of programs in my data science workflow.
When I’m exploring a new dataset for the first time, I reach for R. I have used R regularly since 2009 when I was introduced to it in my statistics classes at Cal Poly. At the time R was not widely used outside of academic statistics departments, but the rise of data science as a new discipline has made R accessible to a much larger audence. Today it is especially welcoming to newcomers to data science, thanks to the contributions of RStudio, tidyverse, and the global community of R developers. I find R most effective for exploratory data analysis and data visualizations with ggplot2.
For data science in production, I depend on Python. I began using Python in 2015 for an internship with the Crew State Monitoring group at NASA Langley Research Center. Unlike R which specializes in statistics, Python is a general purpose language that I find useful for scraping data from websites, developing end-to-end data analysis pipelines, and training machine learning models.
I used Julia for several years in graduate school after I could no longer stomach the sluggish convergence of my Markov chain Monte Carlo (MCMC) algorithm in R. MCMC for Bayesian analysis is a breeze with Mamba.jl. I do not use Julia frequently these days, but I’m eager to reintroduce it into my workflow afters its recent v1.0 release.
Finally, bash scripts are the glue that holds everything together.
I use Visual Studio Code almost every day, a fact which may surprise those who know my general distaste for Microsoft products. But it’s simple, VS Code is excellent open-source software. I also use RStudio for exploratory data analysis, and Neovim with Tmux for tasks that require heavy use of the terminal.
Getting things done
I have a personal subscription to GSuite which I use for GMail, Google Drive, Google Docs, and Google Sheets.
In recent months, I have migrated my note keeping and schedule planning to Notion and I couldn’t be happier with it. Notion is difficult to describe, but I view it as a productivity platform for my life. I highly recommend trying it out.
Dotfiles are configuration files that live in a user’s home directory. By keeping my dotfiles on GitHub, I can get a new system up-and-running in no time at all, backup changes, and share my settings with others. In particular, I maintain my global Git configuration, Zsh aliases, custom Oh-My-Zsh theme, Neovim and Tmux settings, R profile, and Brewfile for macOS.