In today’s rapidly evolving world, the realm of data science has emerged as a powerful force driving innovation and transformation. From unraveling complex patterns to harnessing the potential of AI, data scientists play a pivotal role in shaping the future. And among these passionate problem-solvers stands Nicolas Decavel-Bueff, a driven data scientist, who finds his purpose in the exciting intersection of AI, ethics, and problem-solving.
Raised in a household of high achievers, it’s no surprise that Nicolas isn’t satisfied with career constraints. Continue reading to learn more about the beginning of his career on the product and R&D side, which helped inspire his passions for AI ethics and fairness, and helping others understand AI.
I grew up in a European family – both of my parents migrated to the U.S. in the pursuit of more opportunity. Given that my parents went far in their education, it was no surprise that they instilled this academic focus onto me growing up. Having a twin brother made this competitive, academic atmosphere all the more intense as we constantly tried to “one up” the other both in school and the occasional mental math challenges.
While I’ve relaxed on the competitiveness, I’m enjoying the journey of learning and still participate in the occasional friendly competition. A big driver for me, believe it or not, is my love for board games. You see, data science, in a lot of ways, feels just like a complex board game. It has some of those technical aspects that I really love doing, but it’s very freeform in that the solution isn’t always clear—and it’s fun to go through the process of getting to one.
I did my undergrad at Pitzer College where I studied math, economics, and computer science. I was really trying to push the school to do a minor in data science, but they were still a year or two off from formally establishing that.
My senior year, I had a small math class of around five or six people. It was in differential equations, and I really enjoyed it. My professor encouraged me to apply for Google’s Applied Machine Learning Intensive beta program. It was an amazing experience and after the three-month program, I felt ready to enter the job force. Yet, I realized there was still this huge gap that I didn’t know in terms of the theory behind machine learning and data science.
That led me to my master’s program at University of San Francisco, where I got my graduate degree in data science. Part of the program includes a “Practicum” where they partner you with a company for nine months working as a data scientist part-time. I worked for a company called Canal AI, where I built a lot of different models focusing on things like recommendation systems and classifying supermarket products using natural language processing techniques.
That led to an easy transition to my next role at Urbint, where I worked on an amazing team and built a variety of machine learning products. I built models predicting staffing needs for call centers, models for identifying crossbore threats, and led the engineering efforts for the main product’s R&D team. It was a lot of fun.
That’s where I learned one of my favorite phrases, “perfect is the enemy of the good.” To me, it describes when we’re chasing that idea of ‘perfect’, we can get so tangled up in the little things that we end up missing the big picture, or we keep pushing back that finish line.
It’s weird, because in academics, you’ll never settle for “good.” If I could do it better, I’m going to do it better. But going to industry, that’s where a lot of bottlenecks are.
And now I’m at Pandata. With the current climate, a lot of things are shifting in AI right now with ChatGPT and with large language models having a lot of hype. I wanted to be in a place where I wasn’t constrained by a product, but I could work on a lot of different problems and focus on growing my expertise more laterally versus in-depth on a specific type of modeling.
There are some things that I’m always interested in, and there are others that are more dependent on current trends.
The things I’m always interested in involve ethical AI fairness. At Google’s Applied Machine Learning Intensive, a huge part of the program was asking, “Where does AI go wrong?” We learned to make sure we were creating solutions that didn’t just help certain communities over others.
Other than that, AI explainability is an exciting topic. You hear AI always being called a “black box” and that people don’t know what it’s doing. But there are a lot of really interesting techniques to add that explainability layer.
A lot of AI is that trust. So if I give a client a model and I say, “I don’t know why the AI said that, but here’s the model, I trust the results.” They’ll be like, “Well, I can’t just take your word for it.” They want to validate it themselves.
One of my favorite topics is what we call target or data leakage. If you don’t set up the data properly, you’re going to have a model that seems to be doing amazing. And then once you have it in production, all of a sudden it’s not going to do nearly as well. You see these kinds of problems happen a lot. I’ve been added onto projects where it was initially set up with some accidental data leakage, and I had to go back, find that issue, and fix it. Then, all of a sudden, you see this huge amount of lift.
Right now, my focus is more on large language modeling. It’s really exciting what you can do with tools like ChatGPT, but also a lot of other tools being built around it. In my free time, I’m building web apps in languages I’d never thought I could learn, but I’m using ChatGPT as a tool to help me do so.
A lot of people think AI is going to be their solution. They have a problem and they think, “Let’s throw AI at it.”
That’s almost never the right way to approach it.
To start with AI in mind, you always want to know what the problem is. What are we solving, and what data do we have? Once we have a model that’s giving us an output, how are we using that output to fix the problem?
You know why AI projects fail? There are many reasons in my opinion. One of the biggest reasons is you’ve pivoted yourself into this corner of, “We’re going to use AI to solve this problem,” versus thinking of AI as a tool to be used and whether or not it’s the right tool for the problem.
In general, any time I talk to someone who’s really pushing to utilize AI, that’s when I need to take a step back and ask, “Why are we pushing for AI? What are we currently doing? Why is that failing? Can we use AI to fix it?”
It’s definitely related to that myth that AI is always the solution. I would say that you need to do the legwork. You need to set up the problem well.
Usually when we think of AI to build a model, that is such a small part of the process. A large part is in the design where we ask, “What problem are we solving? How are we defining success? How is this model going to be used? What does the data look like? Do we need more data?”
There are all these questions you need to answer in a very investigative way to ensure that once you get down to creating a model, it’s a quick step to then get those results.
To touch on that data side of things, one of my favorite sayings as a data scientist is “garbage in, garbage out.”
So if you have bad data and you’re expecting great results, you need to change those expectations. Bad data can sometimes be an indication that machine learning is not the solution.
And I know there’s a lot of talk about synthetic data too, where people pay companies to create synthetic environments with your data. I don’t have personal experience with synthetic data, but whenever you want to test how well a model is doing, you want to give it as realistic data as possible to what that model is actually going to be seeing.
So synthetic data can be great for training your model, but you’d always want to make sure when you’re testing your model you’re including something that is realistic for the model to be used on in the future.
As a data scientist, in the past year, I felt a little nervous with all the talk of ChatGPT being able to code. “ChatGPT can do this, this, and this.” And it seems like we’re on this fast-moving freight train of large language models.
LLMs have great potential. But as a data scientist, it’s a tool. And we’re only as good as our toolkits. Learning how to get comfortable using large language models, staying up to date on those topics, and maybe building something yourself, could definitely be useful in making sure that you stay relevant with the toolkits that people are talking about.
On a high level, if you talk to a lot of business owners about AI, the first thing they’re going to say is, “Hey, I heard about ChatGPT…” You want to be able to quickly respond and share that there’s a great use case for solutions like this, but it’s not a solution for every use case. There’s still a lot of value in those traditional machine learning techniques.
Our team of data scientists, including Nicolas, regularly contribute their insight to Pandata’s Voices of Trusted AI email digest. It’s sent every other month and contains helpful trusted AI resources, reputable information, and actionable tips.