Data science is the new “thing I wish I learned in school”

It seems that every generation has its “thing” when it comes to what folks wish they learned in school. For some, it’s changing a tire. For others, it’s doing your taxes or balancing a checkbook (anyone remember those?). Increasingly, in my experience, that “thing” seems to be something related to computer programming—or the analysis of data in a systematic and convincing way.

I ask the reader to think about they do on a given day. Regardless of your list of tasks, I would bet that the lion’s share of them boil down to some version of ‘take in some information and make a decision.’ Business leaders get quarterly reports on sales, revenue, expenditures, and so on, and are tasked with making nip-tuck decisions about how best to proceed. Small business owners of cafes, restaurants, and bakeries take stock of their inventory, customer purchase trends, and the (changing) costs of ingredients to decide what to make and how much of it to produce. Teachers, nurses, principals—you name it—are, for the most part, processing the things happening around them and making decisions.

Believe it or not, that’s all data science.

So why not teach DS explicitly?

I doubt my main argument—that most of us are real-life data scientists making decisions under constraints and uncertainty—is going to rock the boat. However, it does puzzle me then why we don’t prioritize explicit data science instruction in more schools. In particular, I think we should be doing more to introduce concepts like confidence intervals/uncertainty, statistical inference, and many of the underlying concepts around probability at a much younger age.

For some, having that information could be the difference between feeling down about a particular score/letter grade and understanding that assignment is a noisy signal of ability at a particular point in time—and that the best move is to look forward for more opportunities to prove one’s self. For another student, understanding statistics—and research design in particular—could be the difference between taking an off-hand influencer recommendation at face value (“I tried/did X and now suddenly I’m so much better at Y”) or, at a minimum, probing deeper (“I know correlation does not necessarily mean causation, and this seems to me like too strong of a claim.”).

To say nothing of those studies that still make it into both news headlines and social media about the right number of coffee cups to drink in a day to avoid a heart attack. (I think it’s three. No, four. Maybe five?)

In short, learning statistics earlier would serve young adults tremendously by making them more cautious and skeptical consumers of claims and information. (Including this one!)

What about the academic benefits to learning DS?

Unlike other folks, I come to data science (DS) (1) having started in an era that, even though it’s only a few years ago, did not frame things in explicitly “data science-y” terms and (2) from an originally non-technical background. I was going to be a lawyer.

What this combination of backgrounds allows me to do is speak to data science from a number of perspectives to best serve the needs and interests of the student. Some folks really pick up on the logic of DS: they gravitate towards the way in which inference is built around ruling out competing explanations and controlling for as many “confounders” as you can, either through modelling or through research design. Others dig the math, and dive deep into the wonky aspects of estimation (what is the right estimator to use for a count variable versus a binary variable? how about when lots of zeroes are involved?). Still others love graphs and visuals, and quickly develop programming skills required to make compelling, useful, and—yes—aesthetically pleasing charts and figures.

Regardless of one’s chosen path/branch, learning DS skills sharpens abilities that translate into other classes. I went through the same experience in college when working on my senior honors thesis. As we spent more time digging into research output and learning the “craft” of research as my thesis leader, Professor Tom Wong, put it, I found myself sharpening quantitative skills (econ classes), analytical writing skills (required classes in the humanities), and overall just developing greater clarity in thoughts and verbal/written communication.

What is the best way for a young person to learn DS?

To me, the difference between learning DS at a relatively younger age (i.e. middle and high school) versus learning it as a college major or on-the-job skill is the opportunity for exploration. As a young person, you have (1) time and (2) the opportunity to try things that older folks don’t benefit from. (Trust me, I am now old-ish—but I used to be young.)

What this means, in practice, is that you can explore all sorts of different angles of data science—the theoretical statistics, the graphs and visualization, the rigorous research design, etc.—and see what appeals to you. Once you’ve spent adequate time and learned skills in one domain, you can pivot to another—much like other things (sports, academics in general, learning a musical instrument, etc.), practice includes both repetition (practicing a particular stroke or piano piece over and over) as well as generalized skill development (there is a requisite level of cardio required for any kind of swimming—and a certain amount of finger dexterity that you can learn by playing almost any piano song).

It also means that you can try a bunch of different projects. This, in my view, is the greatest benefit to starting early: Trying as many real-world, real-data, real-life projects as you can to ask the question, “Do I really buy this?” Developing a healthy skepticism and then building skills to convince your inner skeptic of things is, in my view, a big part of life.

How would working with you (Igor) help me achieve these skills?

At the risk of coming across like a martial arts or Star Wars movie cliche, I would mostly serve as your guide. Of course, I have formal training in many core parts of data science through my UCLA PhD and I have spent many years working in the space. We can, on any given day, sit down and just have me show you a bunch of things. My guess is it could be pretty interesting and you might even learn something out of it.

Unfortunately, 3-6-9-12 or however many months down the line, my guess is you wouldn’t have developed the independent ability nor the interest required to keep learning. In other words, there’s a good chance that—if you were paying attention—you’ll remember what I told you, but that doesn’t mean that you’d have what you need to continue developing as a data scientist (in the broad sense of that term).

That’s why I prefer a project-based to a top-down approach any day of the week. We would work together to identify a topic, find the appropriate data, and then work through the data science part—including the math and the code—together. At each stage, as often as is helpful, we would check in and you would show me what you have. With guidance, you would go back to your analysis and continue making progress on your own.

Getting started.

If you found any of this convincing, and if data science is the kind of thing you want to learn—or, at least, learn more about—then just reach out via email and we can schedule a conversation. I very much look forward to speaking.

igorgeyn at gmail dot com