We live in an era of unprecedented automation and rapidly expanding data. In 2019 alone, the internet generated 2.5 billion bytes of data every day, but only about .5% of that data was meaningfully analyzed! This data deluge, combined with the want to derive more value ever faster, has many companies beginning to explore new ways to tame the uncertainty of this rapidly growing digital world. These changes have driven unprecedented growth in high-level programming, with Python seeing a historic 7% increase in use during 2019 alone.
With this growth comes opportunities, but also challenges. Although Python and other languages offer incredible flexibility in a package anyone can learn, their presence has led to the rise of the “incidental programmer,” the person who programs for their job, but for whom programming is not primarily their job. For these individuals, the incredible initial value of automation can easily turn sour, as the challenges of engineering a solution scale with its size. And so many companies get stuck somewhere in the middle, using adequate solutions to automate their most painful tasks, but struggling to scale past small scripts and solutions. But there is a way forward!
The struggles that many small and medium-sized businesses face in fully leveraging programming are the same struggles that software engineers have had to conquer over the last 50 years. And luckily for us, there is much documentation to work from – a set of “tips and tricks,” as it were – for engineering a solution that will not only solve the problem, but that can scale with your team as you grow. By learning from what software engineers use to develop complex applications, we can intelligently apply the tools and techniques most likely to simplify our lives. What follows is a list of some of the most common tools and principles to keep our programs simple as they grow:
Each of these tasks doesn’t simply improve the “purity” of your codebase; each translates into time, money and sanity saved for everyone on your team. Coding standards make reading code easy, which saves time debugging, onboarding new employees, and facilitates collaboration. Linters can help nudge you to conform to those standards, makes it easier to understand and debug code when you come back to it later. Automated tests speed up testing and ensure that our code acts the way we think it should, reducing debug and downtime significantly. And finally, version control systems allow not only the security of an offsite backup, but many levels of verification and error-checking, which means fewer bugs for you and your team, a faster development process, and a demonstrably more robust engineered solution.
At Pandata we take these lessons to heart and are always working to better align data science and data engineering. This dual emphasis on efficient prototyping and scalable solutions ensure that our models stay clean and robust while our development stays agile.
Chris Brace is a Data Analyst at Pandata.