Kaggle Deep Dive and Humpbacked Whales


When I watched the fast ai videos, the instructor said it was worthwhil to just go through a bunch of kaggle competitions, download the data, and the submit the results. So for a while I have wanted to spend a few hours becoming familiar to the kaggle eco system and submitting to a bunch of kaggle competitions and becoming familiar with the ecosystem.   I roped Mari into what became an involved afternoon of data munging,

First we installed the kaggle cli. There were some issues with the the token and the kaggle json as well as accepting terms and conditions for each competition we were interested in, but once we figured this out the kaggle cli is relatively easy. It lets you download data and upload results pretty seamlessly. It does some other stuff but I am not sure what that is.

The first competition we looked at was the digit recognizer.  The sample data is a csv.  I believe it comes from the MNIST dataset, which is a dataset of handwritten numbers. Each line is a id with a list of pixels. The pixels, if drawn out, would contain a number. The ML project is to guess the number. We looked at some examples on how to do this, but most of our experience was with image classification so we put this aside. Also Mari is running fastai v3 (the latest) and there were some inconsistencies with the online samples and the v3 library.

We looked for an image classification project and found the humpback whale identification.  90% of the project involved creating a directory structure to support fast ai and then manipulating the result set data into the right file format.  There was also a fair amount of time training the data and downloading the data.  Also trying to figure out the correct functions to use from the fast ai library to extract labels and what not.

It was very helpful to work with Mari because I got a sense of how to go about tweaking learning rates and freezing layers.  A lot of this is still mysterious to me, and I think fast ai makes it even more mysterious. But it was very useful to go through this project and try and apply the ideas from fast ai.   I would like to work in some consistent kaggle competitions into my programming practice. It is a really different way of thinking, I would not call it programming exactly, but a sort of debugging.