My Portfolio

The bigger posts and projects I’ve worked on that I want to share.
Published

April 10, 2021

Intro

Some of the posts on this blog are pretty big guides or documents that I like to reference and refer others to regularly. In addition, I have a few other projects that I haven’t written about, but would still be nice to share. This post will compile all the bigger posts and projects I’ve worked on as one easy to reference article.

stats_can package

I am the creator and maintainer of an open source package called stats_can. It provides a python interface to data and metadata from Statistics Canada. You can read the documentation for it here, the code is available here. You can also see a talk I gave on what goes into building and maintaining the library here. Unfortunately the first few minutes of the talk were not recorded, but the full slide deck is available here.

Terra Mystica faction selection model

This is a project I worked on for a friend that’s really into the board game Terra Mystica. It scrapes a few hundred thousand games off of an online Terra Mystica gaming site, ingests and cleans the game data, and then trains a model to help determine which faction you should play, based on the starting layout of the game. Right now the model is a pretty simple linear regression. At some point I’d like to come back to this and try some bayesian methods, both to learn more about them, and to see if I can improve performance. You can see the code for the project here. The notebooks folder in that repository walks through the development process and was used to provide status updates to the friend I was building the app for. I also built a very rudimentary webapp that will let you plug in the starting conditions for the game, and then return a ranked list of all the factions based on those inputs. It’s hosted on the free tier of Azure, so it’s quite slow, but it’s available here. You can also pull down and deploy the docker container the webapp and model are stored in here.

Rent or Own model

This package is designed to help me understand the financial trade offs between renting or owning a home. Other online calculators exist, but they’re typically designed for the US, and even the Canadian ones I could find were meant for Ontario. Since I live in Alberta, and I like this sort of thing, I thought it would be fun to build my own. You can check out the code here. From the readme on that page, or at this link you can open a binder container that will let you play around with it in a Jupyter notebook. At some point in the future I might turn it into a full fledged web app, but for now the notebook suits my needs.

Data Structures and Algorithms

My job and interests are both more directly related to data science/engineering than classic software engineering. With that said, I do still like to code, and I think having a solid grasps of at least the basics of CS is handy for anyone who spends a lot of time writing code. To that end, in addition to completing a Data Structures and Algorithms course at Athabasca University I like to do the Advent of Code challenges every year:

  • Here’s my 2022 advent. I’ll also use this one for future years. In addition to putting everything in a package as I’d done in previous years, this year was the first one where I set up my development environment in a dev container and a lot of what I learned went into building my devcontainers repo. I also built up documentation and a write up of my notes on the puzzle solutions that were published to readthedocs.
  • Here’s my 2021 advent. This year I made a little data pipeline with a docker file to scrape and read in the puzzle inputs automatically.
  • Here’s my 2020 advent. This year I was also learning package development and type hinting, so the repository and code are really overengineered for what should be a daily coding kata. It was fun to work on though and I learned a lot
  • I didn’t fully complete the 2019 advent. That was the year you implemented an assembly language emulator and built on it through the course of the project. Between that (not really my area of interest) and a whole lot of really tedious maze problems near the end (maybe I should come back to those) I didn’t really have the motivation to complete the last few days. That was the first year I coded along live though, so that was a good challenge.
  • My 2018 advent is a bit of a weird one. I used the advent challenges as the material for a weekly coding tutorial I ran at work. To facilitate that format some of the code is done in jupyter, and a lot of the solutions are less efficient/elegant than I might have done in another context, but they worked to illustrate whatever concept I was covering at the time.

Bigger posts

This section will include links to the larger and more involved blog posts I’ve made.

Homelab cluster

I’ve been learning about proxmox, virtualization, and kubernetes in my homelab. The first in the series on configuring proxmox is here. I also took a stab at getting kubernetes the hard way going on my local setup, and got most of the way through it before realizing there were a lot of differences between my home setup and the cloud and that there was some more fundamental learning I’d have to do before that exercise was super helpful. I did learn a decent amount though, some about kubernetes, more about terraform.

Mortgage modeling

When I was in the process of planning to buy my house, I wanted to understand whether it made sense to go with a variable or fixed rate mortgage. As part of that analysis I pulled some historical data in and did some analysis of how that decision would have worked out for me over the last few decades. The post is here.

Where to live app

This post documents the process I went through to build a where to live app. It scrapes listings from rental and sales sites daily, and then combines them with other data sets like commute times, grocery store locations, and flood risk to produce personalized lists of candidate listings. The blog describes the implementation in more detail, along with some important lessons I learned and links to the code. The blog post is here.

Paper Reproduction - “Alberta’s Fiscal Responses To Fluctuations In Non-Renewable-Resource Revenue” in python

This is a paper reproduction I did of a paper that was published by the University of Calgary’s school of public policy. In the course of reproducing the paper I actually found a data error in the original paper. After correcting the error the conclusions of the paper do not appear to be supported. An interesting exercise in coding, and reproducibility in science. The blog post is here.

Automating provisioning Arch

I use Arch Linux btw. This post links to a 3 part post series I did about automating the setup of my workstations and servers. It goes from a detailed breakdown of the bash script that’s used to get a bare bones arch install, through to using ansible to do all the system configuration, software install, and server setup (mostly a bunch of docker containers), followed by setting up a user profile using rcm.

Python packaging guide

There are lots of great guides on how to build a python package. I’m personally a big fan of hypermodern python and use its accompanying cookiecutter whenever I’m setting up a new project. But there’s not much out there that walks you through all the phases between just having a script that you can run up to building a full blown package. Also, most guides don’t cover conda, and I find the conda docs are really tuned towards people who are bundling C code or something else with their package, not just making a pure python package. To bridge this gap, and to help cement my own understanding of packaging, I wrote this guide.

Setting up a data science environment in Windows

This guide is for everyone out there that only has a locked down Windows machine, but still wants to work with data in python. Almost all the guides I could find online assumed you had a Mac or Linux machine, and even the Windows ones often assumed you had administrative rights. This guide gets you set up with conda, git, and vs code, all without elevated privilege. The only thing I haven’t managed to do in a Windows environment is build a python package. I’m sure it’s doable, but this guide is for those writing scripts/notebooks.