Before Nick Napier realized he wanted to use machine learning to optimize global travel, prevent mental health crises, and alleviate some of businesses’ biggest problems, he was hooked on the launch pad at Cape Canaveral.
Read on to discover how the boy who wanted to be the go-to guy in a NASA control room landed up as a machine learning specialist at Pandata.
Early in life, I lived a few years in Florida. While there as a child I saw multiple space shuttle launches and my school had a space shuttle simulator—this is where my enthusiasm for science and technology began.
As computers became a staple for the everyday household, I fueled my passion for flight through flight simulator games. Combining that passion with a natural enjoyment for math resulted in the idea of aerospace engineering to design spacecraft. I attended the University of Illinois at Urbana-Champaign and obtained my masters in Aerospace Engineering.
The pivot from aerospace engineering to data science isn’t as extreme as one may expect. My initial exposure to machine learning came in the form of robotics. Typically when designing a robotic system, let’s say a mechanical arm that is to move an object similar to a crane, you need to know multiple aspects of the system such as the geometry, motor power, motor response and so on.
By utilizing machine learning, we can bypass the need to mathematically calculate the previously needed physics of the system and implement a predictive model to approximate the control system required. Once I learned that these models could approximate the response of a complex system, I was hooked.
My first machine learning application used neural networks. Since then, that has been my go-to model of choice. Specifically, I find image analysis and generation methods extremely fascinating as they are becoming more prominent in life.
NVIDIA Ceo Jensen Huang, for example, has showcased the true power of these methods by rendering a 3D model of himself in conference videos that attendees could not identify as fake. This has been expanded to multiple areas, such as using ML to generate additional frames of a video, creating realistic 3D renderings from sensor data, and computer vision to calculate distances from images. The applications are starting to become endless.
Everything is growing. GPT-3, the neural network ML model that can generate any type of text, uses around 1.7 billion parameters. Neural networks can now comprise hundreds of millions of parameters because of advances in computing power. This is enabled by the large growth in data collection over the past couple decades alleviating the curse of dimensionality. Data scientists used to work with 10,000 data points. Now they can work with up to millions of data points due to the surge in data collection.
The advantage of this growth is that companies can finely tune and adjust models to their specific problems, learning more about the intrinsic properties that humans can’t pick up on. The disadvantage is that it takes longer (and typically costs more) to get to a final product.
As a business, it’s easy to identify an issue that can be solved with machine learning. The challenge is figuring out the next step beyond using the ML model. For example, if you deploy ML to monitor for insider threats, what’s the next step your organization must take after the ML model starts alerting you to suspicious user activity? Oftentimes, companies forget to consider the human actions that truly make AI successful.
AI-ready organizations must also understand the concept of model drift. Unlike traditional technology, AI requires frequent management and updating. Changing environmental or societal factors often necessitate the retraining of ML models.
One challenge is the ethics around AI implementation. Take image processing, for example.
We are now at the point where we can take old images and produce new 3D images from them.
But how do we use these tools to generate new outputs in a way that’s not damaging? I’m thinking of deepfakes (something that came about with DALL-E) and similar applications of AI.
Here’s the truth: Datasets remain biased towards males and white people. When you deploy these datasets in other, more diverse groups, they start to fail. Representation and fairness learning throughout every stage of AI design are two major areas in data science that need more resources and research.
At a research lab in California, I focused on applying ML methods to anomaly prediction within behavioral analytics. My favorite work, and the most impactful to me, involved working on a suicide prediction model.
The application required extreme anomaly prediction—10 events in 450,000. Not only was this a challenging project from a data science perspective, but I also knew that the results I produced would help provide intervention for someone in need. Providing impactful results is the most satisfying feeling, especially when it is potentially saving a life.
Decompose the business problem to a specific application scope. There have been too many times I have heard a senior leader discuss implementing a machine learning solution to a problem without realizing it is three or four separate smaller issues to address.
For instance, when working on a vehicle routing problem I was told we needed to provide better “optimization.” As a data scientist, my question was, “What kind of optimization?” The organization clarified their need for both vehicle path optimization and a restructuring of the whole transportation network. Because we broke down the root problem into separate parts, we were able to scope the work and ML capabilities appropriately.
To proactively make a project run smoother, I would also recommend that an internal analyst clean and prepare the data for use to carry out exploratory data analysis. This will provide a foundational starting point for a couple of data scientists to come in and begin the feature engineering and machine learning modeling efforts. Plus, having cleaned data prior to initiating the implementation of machine learning methods will save 50-80% of the time a data scientist needs to provide a solution.
Narrow your focus. Someone who wants to be an expert in “data science” will never get there. The destination is too broad. In the course of the next 10 to 15 years, “data science” generic roles will likely be phased out and the focus will be on more specialized fields. My advice is to focus on one aspect of data science and have a collection of tools for other areas.
For example, my focus is ML operations architecture. I use a handful of tools as go-to tools for predictive modeling. The tools allow me to be confident enough to get the ball rolling for a project, but I never claim or try to be an expert in every facet of data science.
When I am not at work, I like to explore creativity through coding Dungeons & Dragons worlds into virtual tabletop games. In the tabletop game, you roll the dice, you add them on paper, that’s your outcome.
On the computer, I write code that automatically looks up connections between the random numbers and the game play options. The code automatically calculates the game so that people can spend less time doing math and more time playing.
Our team of data scientists, including Nick, regularly contribute their insight to Pandata’s Voices of Trusted AI email digest. It’s a once-per-month email that contains helpful trusted AI resources, reputable information, and actionable tips.