An introduction to productionalized ETL (Extract, Transform, Load) processes using the OS library.

Tired of copying and pasting data from multiple different spreadsheets into one Excel doc? Wish there was a way to automate this process in bulk? Fear not! A simple combination of pandas DataFrames and the OS library can save HOURS of your time, not to mention reduce human error as a result of manual data entry.

The Python OS Module

Python’s OS library allows developers to interact with their operating system in many of the same ways as using a command line. It provides many useful functions to create and remove…


A step-by-step guide for what to do when you’ve got more data than one computer can handle.

At some point in a nascent data scientist’s career, you’re going to encounter a dataset that is computationally just too much for your individual machine to process. Or maybe you’re working on a collaborative project and have to give different users read/write access to the same dataset. Either way, once you’ve outgrown the Jupyter Notebook and are ready to start building real storage solutions, there’s no better place than the cloud.

Getting Started with AWS

Regardless of which storage solution you choose, you will likely need to…

Apascale

To SQL or NOSQL. That is the query.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store