29 Aug 2022 —
I’ve written here before about honeypot ants. The main takeaway is that Westerners are merely recycling the very few scientific facts and visual representations that we have about these species. To further highlight this… PAUCITY!, even a single new image is capable of changing how we think about honeypot ants—and even how we think about ants in general.
I am just tickled pink to reveal that I am the vigilant myrmecologist who discovered such an image, and I was able to get a single-author publication in the prestigious scientific journal Ecology for my efforts. Although the article was changed from my initial intent during the peer-review process, I’m posting it here in its original form (and original title).
↪ [Read more...]
24 Aug 2022 —
Listen, I’m just going to say it all now: I’m not going to post my follow-up to that initial article about Western depictions of honeypot ants. Although Westerners don’t seem very creative or nuanced when it comes to depicting these ants in MOST art… the reveal was going to be that people get quite creative when it comes to NSFW depictions.
But I’m not going to release that “deep dive,” so don’t worry about it too much.
↪ [Read more...]
23 Sep 2020 —
So you have a bunch of plots and they all have color scales with different limits and you realize that eyeballing the numbers trying to normalize the colors in your head is a bad way to compare them, huh?
You could manually review each plot, then manually set the limits of each color scale so that they encompass the same set, and then hope you never change the data in a way that would invalidate those limits, but that’s dumb, huh?
Come along and find out how to set non-position scales to be constant across multiple plots!
↪ [Read more...]
07 Jun 2020 —
You would have thought that I would have learned to RTFM by now, but too often I find myself learning the subtleties of an R package by tearing it apart. I swear that I read the documentation and do the requisite Googling and StackExchanging, but it always seems whatever I want to do is just a little too esoteric for the mainstream. I’m just too addicted to R’s wonderful metaprogramming abilities, and I guess making that work often involves needing to understand deeper parts of the code.
I got distracted by these types of problems most recently with the knitr
package in R. knitr
is used to make basically any type of document these days, and the package does an amazing job of walking the line between being user-friendly and deep and customizable. Here I’ll show you a few tricks that will hopefully get you thinking about how you can customize knitr!
↪ [Read more...]
14 May 2020 —
I grew up in Indiana, and although I was a Boy Scout, and the occasional begrudging hiker, I would have never said I was an “outdoors” person. But as I’ve gone through life, I’ve begun to suspect that this was a product of circumstances more than disposition. I’ve fallen in love with hiking and camping in the American Southwest, and I think the reason why I never got excited about it growing up is that hiking in Indiana is boring.
I don’t know how I became aware of foraging (as a modern, hipster concept), but I do know that it’s gotten me to pay much more attention to nature around me. Like hiking, I think that going outside would have been much more appealing to me as a child if somebody had told me how much of the outside you can just eat.
↪ [Read more...]
04 Apr 2020 —
Plotly, which lets you interact with data and plots in incredibly pleasing ways (see this post by my brother and I for examples) offers a load of cool possibilities with R, whether you want dashboards or engaging data visualizations. It’s super web-friendly and fits like a glove into workflows that knit HTML.
The only problem is that you’re basically screwed if you want to use Plotly (or any HTML widgets) with Jekyll or GitHub Pages. Sure, there are ways you can do it, but they’re enormously hacky and would lead to an insane posting workflow. In this post, I will show you how to do it the right way.
↪ [Read more...]
27 Mar 2020 —
Science is all about asking the right questions, yada yada, something like that. But normally the sorts of questions I ask require ages of preparation, multiple failed experiments, and oodles of tedious computer analysis. However, this time it was going to be different. We had a simple question, easy data, and a straight-forward approach. A simple job, in and out, right?
Our question was about questions—questions, question marks, the film industry, and making money.
↪ [Read more...]
27 Mar 2020 —
You maybe have seen the post my brother and I just made about our investigation of some of the Hollywood superstitions. If not, go read that first. This is just a little writeup about some of the technical problems we had working on the project, perfect for someone interested in how to do their own fast and dirty data science investigations, or someone interested in listening to me complain.
↪ [Read more...]
20 Jan 2020 —
Although I started this whole personal weight analysis thing because I thought I was going to start dieting, I quickly realized that frequent fine-grained weight measurements could serve other, more experimental purposes.
Finally, the scatological blog post you’ve all been waiting for! 💩💩💩
Part 4: Bowel movements, losing weight while sleeping, and other questions
↪ [Read more...]
04 Jan 2020 —
Previously, I’ve talked about how I’ve started to collect fine-grained information about my own weight and how I went about cleaning the data for analysis.
Before I go into my experiments, I’m making a temporary detour into different modeling questions.
Feel free to come along with me as I try to make sense of trends in my own weight with generalized additive models!
Part 3: Modeling long-term weight data with GAMs
↪ [Read more...]
02 Jan 2020 —
If you don’t remember my previous post about my custom Bluetooth scale from a couple of months ago, I’ve been collecting a large amount of fine-grained information about my weight for the past couple of months.
In this post, I’ll walk through my initial look at it, some problems I had with cleaning the data, and what I did to fix them.
Part 2: Cleaning/preparing personal weight data
↪ [Read more...]
07 Sep 2019 —
I’m getting fat. Well, fatter. My twin brother and perfect control experiment recently lost a lot of weight dieting, which got me thinking about going on one myself. As a data scientist / machine learning engineer, I saw this as a perfect opportunity to get some good data.
I’ve spent three months working on a Bluetooth weight collection system and zero months on a diet, so without further ado, let me walk you through starting up your very own Bluetooth bathroom scale.
Part 1: Raspberry Pi 4 bluetooth scale
↪ [Read more...]
17 Jun 2019 —
At my new machine-learning job (internship), I use a lot of Jupyter notebooks. If you don’t know what a Jupyter notebook is, it’s kind of like a more interactive version of an R Markdown sheet, but for Python. They’re great, but there were a few features (or lack thereof) that really got on my nerves. Luckily the stuff under the hood of Jupyter notebook is crazy flexible, and with a little know-how we can jerry-rig us some cool stuff.
Specifically, we can use IPython’s “magic” commands.
↪ [Read more...]
07 May 2019 —
If you’ve been visualizing different types of data for long enough, you’re basically guaranteed to run up against the bounds of what’s easy/possible to do in whatever software you use.
I almost exclusively use R’s ggplot2
to plot stuff, and I’ve found multiple times that there are just some things that you can’t do and that the development team doesn’t plan on implementing anytime soon.
Here, I’ll share some code to make the impossible possible: setting different scales / coordinates for individual facets.
↪ [Read more...]
07 Feb 2019 —
I guess I never made a post about it, but a while ago I had my first R package accepted to CRAN: catchr 0.1.0. It started based on a function I had written that I found myself using time and time again: a function that would collect all the warnings, messages, and errors raised in the process of evaluating code, and return them with the result of the code.
I was so proud of this “clever” code, but something eventually happened that made me take a deep dive into the rlang package by the RStudio people. Their code was so much better than mine that I felt viscerally humiliated.
After recovering a little from the shock, I reevaluated all of my preconceived notions for the package, and outlined the tools I wanted to provide people with. From there, I basically rebuilt it from scratch in a principled way, and turned it into something I’m actually proud of.
I present to you, catchr 0.2.0!
↪ [Read more...]
19 Jan 2019 —
Did you know that information on the personal finances of anyone running for US Congress, the Presidency, and any number of federal offices is available to the public? It’s true, and an organization has collected all this information, processed it, and put into a single format and location for free! A perfect treasure trove for data scientists and researchers!
The only catch is… it sucks.
↪ [Read more...]
28 Sep 2018 —
As you may have noticed in my previous, criminally short introduction, there were no pictures of any actual honeypot ants. Given that this is a series of blog posts centered around these ants, that might seem strange and self-defeating. I would personally agree: at this point, I don’t remember why I thought it was a good idea.
But you know what? Let me make that up to you. We’re going H-A-double-M on pictures today, folks: welcome to the porn-free, image-palooza post.
↪ [Read more...]
08 Aug 2018 —
Did you know that if you try to scrape too many pages at a time from the same website, it will sometimes think you’re being malicious and block your IP address? Or that using Python’s urllib
or shelve
packages totally sucks on some computers?
Come, let me show off teach you about some of the more nuanced aspects of web-scraping buttloads of data from the internet.
↪ [Read more...]
27 Jun 2018 —
Think of the most popular animal you know. Most of us will imagine pandas, elephants, dolphins, maybe bald eagles. These are the names kids shout out when asked about their favorites; these animals are considered “charismatic.” We paint pictures of them, we write songs about them, we use them as symbols, and we include them in metaphors.
Now think of the most charismatic ant you’ve heard of.
↪ [Read more...]
26 Jun 2018 —
(Second post here)
Think of the most popular animal you know. Most of us will imagine pandas, elephants, dolphins, maybe bald eagles. These are the names kids shout out when asked about their favorites; these animals are considered “charismatic.” We paint pictures of them, we write songs about them, we use them as symbols, and we include them in metaphors.
Now think of the most charismatic ant you’ve heard of.
↪ [Read more...]
13 Jun 2018 —
After reading my earlier blog post about running asynchronous R calls on a remote server, you probably got pumped at the idea of “nested futures”, remote clusters, or my use of the marquee HTML tag. Regardless of your excitement, it’s time to find out how you can take your parallel processing game to the next level.
This post is meant for two purposes: the first is to document an example of using remote clusters with R, and the second is to serve as instructions/reference for my lab members at Rochester.
↪ [Read more...]
12 Jun 2018 —
Instead of using the term “computer nerd”, I always preferred to describe myself as a “computer jock” growing up. After all, if there had been a programming team in high school, I felt like I probably would have made varsity.
But sometimes, athletes are limited by the equipment they have on hand. Cyclists can’t ride the Tour de France with tricycles, and my lab can’t run hundreds of thousands of simulations on our 2014 Macbooks. We’ve been feeling the crunch as of late, which is why we’ve begun exploring the various computing clusters available to us on campus.
This post is meant for two purposes: the first is to document an example of using remote clusters with R, and the second is to serve as instructions/reference for my lab members at Rochester.
↪ [Read more...]
05 Mar 2018 —
If I wanted to make this post sound professional and industrious, I would say that my motivations behind this project were because I’ve started working towards my Bayesian model of webcomic updates again, and that I’m taking an intermediate step by analyzing data from similar content creators.
But the truth is, I was just pissed off that I couldn’t read the manga I wanted to.
These are the Python lessons I learned scraping manga scanlations off of 4chan.
Part 3: The ‘pythonicity’ of decorators
↪ [Read more...]
04 Mar 2018 —
Have you ever wanted to make a small pull request to improve an open-source project that you have a heavily modified version of? For example, say you have a personal version of a repo that you’ve changed a bunch with a particular aspect you think the main project would find useful, but you don’t want to make them pull all your custom code?
I’ve run into this type of problem a bunch times, so I’m making a really short post on how to make a pull request to a project for just a few specific commits.
↪ [Read more...]
03 Mar 2018 —
It was a nice evening, sitting around a fire under the stars with my new lab a few days after I started grad school. My girlfriend was visiting at the time, and I was gently teasing her about the way she pronounced the word “eggs” (“eygs”).
No one batted an eye.
It was then that she called me out for pronouncing “museum” as “myoo-zam.”
All hell broke loose.
↪ [Read more...]
15 Feb 2018 —
If I wanted to make this post sound professional and industrious, I would say that my motivations behind this project were because I’ve started working towards my Bayesian model of webcomic updates again, and that I’m taking an intermediate step by analyzing data from similar content creators.
But the truth is, I was just pissed off that I couldn’t read the manga I wanted to.
These are the Python lessons I learned scraping manga scanlations off of 4chan.
↪ [Read more...]
10 Feb 2018 —
If I wanted to make this post sound professional and industrious, I would say that my motivations behind this project were because I’ve started working towards my Bayesian model of webcomic updates again, and that I’m taking an intermediate step by analyzing data from similar content creators.
But the truth is, I was just pissed off that I couldn’t read the manga I wanted to.
These are the Python lessons I learned scraping manga scanlations off of 4chan.
Part 1: Logging is easy!
↪ [Read more...]
16 Nov 2017 —
What does a quintessential heartland rock song have to do with a traditional Japanese aesthetic?
American rock roll legend Bob Seger isn’t Japanese. I’m not sure he’s ever even been to Japan. It makes it weird, then, that his music so perfectly captures a uniquely Japanese aesthetic, one infamously difficult to translate. What’s even weirder is that he can do this in a song about how he was constantly horny as a kid.
Of course, I’m talking about the mono no aware (物の哀れ) in his hit song, “Night Moves”.
↪ [Read more...]
28 Oct 2017 —
I’ve recently gotten into the fabulous world of mechanical keyboards. Other than the Aesthetics and the fact that building working electronics is cool, the main draw for me was the ability to completely customize how your board works. Just imagine all the chances for increased productivity!
Of course, good cognitive scientist as I am, I know that optimal performance requires a good grip of the statistics of the environment you’re working in. For us, that’s going to be our typing patterns! Here, I’ll show you how I was able to capture a huge amount of data on how I type, how to play around with it, and how to do the same for your own keyboard.
↪ [Read more...]
14 Nov 2016 —
I like statistical models. I like webcomics. I like not having to suffer through deciding whether a webcomic is ever going to update regularly again. I began to ask myself, “Can I use statistical modelling to tell me when I should stop hoping a webcomic will keep updating?”
Nothing is more haunting than that oft-repeated phrase: “updates when?” It’s not even about the wait, it’s about the uncertainty–either end or start updating, don’t keep me in limbo! I would love it if I could make a model that could just tell me, “Hey, this comic is entering its death spiral, abandon ship!” Also, I just like learning new statistical methods.
Although that model is still in the works, I’ve gotten my hands on a bunch of cool data in the meantime. This post isn’t quite a tutorial; it’s more like a demonstration of how you can fun with simple web scraping and niche interests–but I’ve attached all the code I used, complete with documentation and a flexible design for newbies who want to start collecting their own data.
↪ [Read more...]