are the models lying to you?

to model or not to model: why your grandma can do better prediction

are the models lying to you?

in all of the classes i teach (numerical analysis, feedback control, linear system theory, etc.), and in [almost] every paper i have written, i end up talking about models. in the classroom, i usually have to start by talking specifically about models of dynamical systems—systems that evolve in space and time.

to kick off that discussion, i bring up newton’s second law, and then i go on a small rant about how newton’s second law should really be called newton’s second model, not a law in any absolute sense. i know that sounds pedantic, but the distinction actually matters a lot—and undergrads are rarely exposed to it.

i usually start with the predictable motion of a small ball on a table through $\sum_i F_i(t) = ma(t) = m \ddot{x}(t)$, where the time argument is $(t)$, $m$ is the mass of the ball, and $a(t)$, $x(t)$, and $F_i(t)$ are the acceleration, displacement, and forces acting on the ball. because we know the mass of the ball, the friction coefficient of the table, air resistance, and so on, we can do a pretty good job predicting where the ball will end up.

then i pivot to engineering systems: systems that we built, from tiny rlc circuits and op amps to mass spring dampers and water pumping stations. we have been designing these systems by hand for two or three centuries, so we know a lot about them: we know the subtleties of the math and the idiosyncrasies of the models and parameters. really, we know a lot about them because we built them. in a power transmission network, for example, the operator or utility usually knows which lines connect which nodes or buses. the operator knows the material of the transmission line and its parameters. in a drinking water network, the utility has access to the maps and files that define pipe diameters and friction coefficients.

so the story goes something like this: we built these systems, we kept the maps, parameter lists, GIS files, and so on, and while the physics is still not trivial, we can often rely on a model to predict or control the system. but we should always stay a little cynical and use data to reaffirm that the predictive model is actually valid.

this is why model calibration is still such an active area of engineering research (even if it is also reviewer 2’s favorite lazy comment because it is such an easy thing to say). the good news is that for small systems, model calibration is usually not too difficult. but for large-scale networks, the problem becomes extremely hard. that is why, if you review a paper on this topic and the authors are using ridiculous priors on the parameters, you should call them out and challenge it. otherwise it starts to feel like gambling in vegas while somehow having access to everyone else’s cards; it’s cheating.

anyway, this blog is not a rant about model calibration. it is usually at this point in class that i ask students: do you think we can predict quantities like weather, air quality, atmospheric pressure, and so on, as well as we can predict your temperature in a well-controlled room? i always get a lot of interesting answers from both sophomores and senior graduate students, but the bottom line is that most people who are not experienced in modeling will tell you not to trust predictions of these quantities. why? because we were all told that it was going to snow 12 inches tomorrow, and then we 2 (or 24). then all the schools closed anyway, and now you are ready flee to new zealand because you spent 4 straight days stuck at home with your siblings. you get the point. when your margin of error is 10 standard deviations, and you’re using gaussian distributions, extreme level of statistical certainty where the probability of the true value falling outside the interval is approximately $10^{-23}$.

the broader point is that nature-based systems (climate, weather, air quality, hydrological systems) are much harder to predict. so then the natural question is: why bother predicting them at all? if you have ever started digging into climate models, you quickly realize how intractable they are. and one of the golden rules of dynamic modeling in large-scale systems is that the model should not become so complicated that analysis becomes impossible. now, if someone reading this is anti climate change advocacy, they are probably about to rush into thinking that i am sympathetic to their ill-intentioned cause, but i am not. two things can be true at the same time: climate models are probably not that useful and burning carbon makes things worse and climate change is real.

i know that in today’s discourse it is hard to add any nuance to any topic, because you are either with us or against us. but i am here to offer the boring realization that, yeah, climate models are probably not all that meaningful in the way people often imagine. and most likely, if you completely ignore the climate physics and instead build an autoregressive model that predicts the evolution of climate states purely from data (past temperatures, greenhouse gas emissions, and so on) you still end up at the same trivial conclusion: things are looking bad. you probably do not need PDEs or supercomputers to simulate the global climate just to reach that punchline. your grandma can probably tell you that summers have gotten noticeably hotter over the past two or three decades, and that extreme weather is becoming more likely.

if newton were alive, he’d probably tell you not to model the different layers of the climate and to just collect more measurements and think a little harder. but you do you. after all, your apple is different from newton’s.

and unlike today’s politics and the world (dis)order we have to endure, there is no conspiracy here. no one is lying to you. it’s just that people wanna feel good about what they do, and statisticians and my-model-is-better-than-yours zealots, very much like the mckinsey consultants, just wanna sell their ideas and products.