Kathryn Berger is the data science team leader at Agrimetrics.
She shares her data science journey from the beginning to now and what she looks for as a Team Lead when hiring data scientists for her team.
Kathryn spent most of her career harnessing geospatial data to solve global health and food security challenges — she’s no stranger to predictive disease models and maps.
She’s also explored risk maps and models in the agricultural/food sector finding ways to use geospatial data.
Currently, she’s using her data science skills to help others make better use of their data to explore and add value and understand the power of their data.
She’s had an interest in geospatial since grade school, to be exact. She saw the Challenger disaster unfold, and as a result, she and her generation benefitted from many NASA-funded studies and work put into the school systems to get young people interested in space and satellites.
In school, they used satellite data to explore forest health, going through the intricacies of seeing how a forest stand is doing by exploring data from space.
And so she was hooked — until today.
I never really considered it being data science until not too long ago — I’d say five years ago.
I was developing predictive models, using programming skills, solving issues. I hadn’t thought of myself doing data science until it was actually labeled data science.
I considered myself a research scientist, to be honest.
It was just scientific research.
I enjoy exploring and solving problems with satellite data.
If somebody came to me with a problem to solve an issue or find patterns, that’s what I get most excited about and that’s where I targeted my career.
I followed problems that interested me. Sometimes they’d call me a disease mapping specialist or a research associate in disease patterns.
It was never that they called me a data scientist.
I always thought data science is all programming; you must have had a degree in computer science or have a computer science background to get into data science.
What I found was that I was doing a lot more problem-solving. I used programming; I explored big data and developed predictive models, only I didn’t call myself a data scientist.
I didn’t think I had the background to call myself that. So I went with research scientists, solving problems, like looking at where the next pandemic flu outbreak would be.
It was a bit of a transition.
I never planned on being a team leader. It was a chain of events that I ended up being one. I was just doing my work and speaking up for and representing my team and data science within the organization.
So my Team Leader position organically evolved.
Yes, I’d say that’s a fair representation of what happened.
Within our data science team, we have a terrific array of backgrounds.
The entire team isn’t just made up of folks who studied computer science or have done programming and considered themselves data scientists from the start.
Many of us come from different backgrounds — statistics, Earth Science observation, or crop modeling. Everybody brings to the table a unique background, a different skill set, and we all use the data and science skills to solve the problems at hand.
Yes, we look for programming skills. Yes, we’re looking for those machine learning, statistics, and quantitative problem-solving skills to take models, develop them, and further them.
We want all of those skills.
But we look for folks able to see the big picture and use soft skills we often talk about, understanding the business issue, or communicating with a client.
We value the ability to develop really cool models and to solve problems. We also look for the ability to understand the problem the client needs to solve, understand the data, advocate better use of the data, or what data is missing.
Communication is just as important as the other data science skills — visualization or developing a fantastic model. Those are also components of the data story.
Plus, you need domain knowledge to make the story whole and put the pieces together. Sometimes, having domain knowledge is the only way to know if a model will make sense in real life.
Add enthusiasm and willingness to go after a problem, and you have a winning team.
Naturally, we require prerequisite skills for the job — programming in Python, for example. We primarily work with Python.
Whether you work with Python or R, a lot of those can be transferable. Once you know one, you can extend and learn others. That’s something you can always learn on the job and adapt.
To have that willingness and that ability to learn quickly is helpful. We’d look at your background before and your quantitative skills.
What kind of models have you developed further? What are your experiences in developing large predictive models? What kind of data have you worked with?
We’d want to get a good idea of what you’ve explored and how you’re able to think about it quantitatively, consider the problem and how you can scale out.
We don’t want to develop just that one-off model that’s going to solve a problem — we want to figure out how to develop and deploy a model at scale.
That takes a whole different set of skills. Often, those skills overlap, but some extend out of data science to DevOps or engineering work.
Can I show up without a formal education, with the skills you mentioned before if I can document the journey and show you the things that I’ve worked on? Would you hire me?
It’s not a straight answer because it’s a combination of things I look for.
Our entire team has gone through formal training — most have gotten a Ph.D. within their own field.
If you have enough field experience within data science and you can demonstrate skills, that would be helpful. Show where you can go and the models you’ve built.
Having a formal education is certainly not a 100% requirement. Still, it’s often going to be something that’s going to make you much more competitive.
Many folks are self-taught programmers and demonstrate the growth or evolution into their career on how they’ve gotten there.
We often have a problem that we prepare for the candidates.
We’d like to see how they solve it.
We’d give them a data set and want them to take it, make it theirs and develop a model off it to demonstrate their capabilities and their data science skills.
I remember being impressed by somebody who had taken the data available on our Agrimetrics data marketplace. Without instigation, they developed their own models and data visualizations. They explored different trends about data that wasn’t linked or had no direct connection.
They showed and explained to us what trends they found.
Somebody took the initiative to demonstrate a model showing different data feeds and how they would better tell a story from that.
Talk about your portfolio items or show the panel something they might find useful or interesting.
That kind of initiative excited us because you can’t teach that.
It’s not a skill you read in the books. It’s something you have or you don’t.
That’s a big add.
Data science is not all about beautiful models and cool machine learning tools.
There is a lot of data cleaning, data processing, data wrangling — the less sexy side of data science that you don’t hear people talk about on podcasts.
But that’s also a significant component of the day-in-day-out skills you need as part of the job.
Balancing the less exciting things with developing attractive models is undoubtedly a challenge.
But it’s part of the process and you need to see the bigger picture.
What’s the goal here? What are we trying to identify with? What kind of problem are we trying to solve with the data set at hand? What do we need to do to take it from point A to point B?
Too often, people build elaborate models with bells and whistles. They keep adding and exploring.
But then there’s making efficient things that are going to scale and deploy out.
You could build an elaborate model to solve a significant problem. Still, if it takes 15 minutes to run while somebody’s waiting on the other side of your desk or on your company website, it’ll be difficult to deploy that.
That’s the business end — getting results at a reasonable rate.
There’s always more R in that R&D process… Sometimes it’s easy to get carried away.
When left to our own devices, we’ll explore different facets of the problem. We have to remind ourselves it’s a business problem. We need to make the most out of the technical benefits of the model while also making sure it’ll fit the end-user requirements for their business problem.
We must strike a balance with the scope, the requirements and the results.
It is an iterative process.
We keep going back to the client or end-user. We work agile, check-in, and share the progress done for, and with the client.
Then we make sure that the feedback is helping to inform our further product development or further problem-solving while keeping within the scope.
We document that first, but we make sure we’re continuously checking back with the person or the organization we’re working with to make sure what’s been done suits their needs.
That’s a lot of communication and soft skills — identifying the problem, understanding the client’s requirements, thinking, and translating that into a model.
Each organization will have its own setup.
But it’s important to communicate the data science requirements and needs and understand the actual business issue — the problem that needs to be solved.
Doing so will only make you stand out.
It’s relatively easy to sit behind a computer, explore the data and solve a problem.
But it’s not easy to relay that data, the requirements and communicate the results and what they mean, that’s what’s essential.
It’s going to be increasingly important. Data science tools are becoming more accessible to people, and things are getting easier.
Being able to translate and identify the problem, figure out the rest of the components, and communicate with the client — sit with them and tell them how their data can benefit them is valuable.
Many times people have data but they don’t know what to do with it. If you can work with them and help them figure out how they can save time, money or make things efficient, you’ll stand out.
There are a lot of outstanding models out there and that’s only developing and further expanding.
The tools and skillsets are also more accessible.
The question is this:
Why would you use a neural network versus random forest model?
It’s about understanding why you apply the machine learning methodologies to the problem and the data set.
You need domain knowledge to make better sense of the black box data of machine learning and data science.
You also need to advocate why you’re choosing one thing over another — what other pieces, data sets, or additional models you could add on.
That’ll set you aside.
Data science will be ubiquitous across a lot of organizations.
Already many try to incorporate data science, but they don’t always know what that entails.
They hear many buzzwords, like machine learning or AI, and they want to use them. They hear they’ll save them money, or give them results faster.
Success depends on how ready and prepared organizations are for data science and what field they’re in.
In agriculture, for example, there is a plethora of data available — data from tractors, sensors on cows, or satellite data.
How do you make the best use of these? Unless you can piece them together and figure out what each data set covers in that landscape or ecosystem, it’s just… data.
Yes, ironically, that’s often the case.
Nobody walks into an organization and builds an amazing model right off the bat.
What do you need to solve? Are you missing any data? Does your data even fit the requirements? Are you trying to build daily forecasts with monthly data?
As a data scientist, you don’t just go in and solve problems. You make recommendations to multi-faceted issues so that you get a fantastic model in the end.
You’ll also be advocating a better use and understanding of the data while you do that.
Couple of things that I found really interesting about Kathryn’s story.
She was already doing leadership in her job when she was asked to lead. This is a brilliant way to go about it.
The way I go about making a podcast is that sometimes I have an idea and go looking for a guest.
Other times the opportunity to work with a specific guest is presented. We look for an idea or an overlap between the guest’s expertise and what you, the listener, might be interested in hearing about.
There’s always some kind of intent behind each podcast episode that gets published.
My intent here is to give you a look into the life of a data scientist.
Kathryn is a data scientist. She never downplayed the need for scientific method and scientific rigor but she spent a long time advocating for all the things around science.
When I think about cartography, we understand that cartography is a mixture of science and art. We’re comfortable with that when we talk about cartography.
When we talk about being a data scientist, I wonder how comfortable we are about it being a mixture here of scientific method, scientific rigor, and art.
The art side lies in the periphery, in the overlap, where we move away from the science and start thinking about how this can solve specific problems.
Kathryn talked about being an advocate, advocating for the data, helping people understand their opportunities, what they need to get to where they’re trying to go in terms of data science.
I think this requires leadership and it requires a certain amount of generosity. Plus creativity.
When the art part comes in, it occurred to me that if you could mix or combine science and art, bring the combination of science and art to the work you do, that would be remarkable.
That would be the moat you could build a very successful career around.
You could build a successful business or brand.
This would be something that would be very difficult to democratize.
Let’s not do just science, but be an advocate for the science that we are doing.
When we are an advocate, demonstrate leadership, and act as an ambassador for our work, our science, our craft, it requires creativity and experimentation.
It may be helpful to think of it as our art.
Maybe that will give us the freedom we need to try different things, things that might not work and experiment with intent, intending to make our work, craft, or research more approachable and relevant to the people we are trying to help.
Be sure to subscribe to our podcast for weekly episodes that connect the geospatial community.
For more exclusive content, join our email. No spam! Just insightful content about the geospatial industry.
Crafting a quality application for a job you really want takes time, so you do not want to spread yourself too thin. When constructing your CV, it is important to keep your audience in mind. Realistically, the first set of eyes will likely be a computer algorithm, scraping the submitted CVs for certain keywords.