Transparent famine models I: Counting calories
An outsider's project to understand famine prediction models
As mentioned in my last post, some predictions estimate that there may be hundreds of thousands or even millions of deaths from famine in Sudan in the coming year or two. It may be impossible at this point to predict a precise figure, and the outcome may well depend on evolving factors such as the imposition or removal of blockades1 on food and other resources. But if such a catastrophe is even a possibility, it is worthwhile to do our best to model what is going on, perhaps because such models may be of use for planning, but at a minimum because they could be helpful in increasing public attention on the issue.
In my research so far into models predicting famine numbers, however, I have been surprised at the extent to which the models are not transparent. That is, when two models disagree, with one claiming very high death estimates and others predicting more limited famine (there is clearly a famine already underway in some parts of Sudan), it’s unclear what drives these differences. If the data and code were publicly available, we could compare the models, see where they differ, and then identify which of these differences are in turn responsible for the different predictions they make. So that’s what I’ve set out to do, with hopes that I will find some fellow travellers along the way.
I have begun building a public repository with code, data, and (eventually) results, so that anyone interested in these issues can (a) see what methods are used and re-use the code (b) find all sources of assumptions or data, (c) compare different approaches or assumptions and look at how that changes the predictions. The objective is to apply Bayesian or frequentist methods to assess the source of uncertainty in the models, and then do whatever is necessary to track down additional information to drive down that uncertainty. For now, there’s still a lot of foundational work to be done, but if you’re curious about the process of predicting famines, I invite you to follow along and contribute.
Types of famine prediction
First, a quick note: there are different approaches to predicting famines. Two basic approaches involve (a) looking for sudden increases in the prices of foods, which signal a scarcity and which will make it impossible for people on limited budgets to buy enough food (b) estimate the available food supplies in a given time period and then estimate whether this is sufficient to feed the population. The first approach is probably more accurate, but seems to me to be less useful as a leading indicator. So while I hope to cover both, I will start with the second approach.
Warning!
I am not an expert in these matters (so if you are please feel free to get in touch). I have reached out to experts in the field and gotten some feedback, but am liable to make mistakes and take credit for all errors. The one thing I’m offering that doesn’t appear to exist is transparency - all assumptions and sources and calculations will be clear to anyone. In one model I was trying to look under the hood of, their methodology description involved a step of essentially applying “expert judgment.” I don’t doubt that they are experts, and perhaps ultimately these problems are so complex and variable that the best method is human oversight and correction. But hopefully there is some value in laying out the assumptions, data, and model for others to see and improve upon, and so that is the hypothesis of this approach.
Project 1: How many calories in Sudan?
While the overall project is a careful, transparent model that examines different approaches and assumptions, to get there I needed to start with a minimum viable product, because there are a lot of moving pieces. So today I will start just by explaining one part of the process, which is to estimate the number of calories of food that we think are available in Sudan in the period from September 2023 to September 2024. This is my attempt to reconstruct the methodology described in the Clingendael reports, and based on limited conversations with an expert in the field.
If you prefer to walkthrough it with the data in a jupyter notebook, you can do so at the repository here.
Here’s an overview of the process:
Get the data on the metric tons of wheat, sorghum, and millet produced in the last harvest.
Add in some data about other sources of food - aid, stocks, wild grains
Subtract say 20 metric tons since not every grain of wheat will be consumed, some will remain end of September in stocks, etc.
Convert the metric tons of grains to calories, using data on dry mass and correcting for the moisture content (I didn’t see explanations for this process so I put together a process based on some research I found).
Once we have the number of calories from grain, we need to add in non-grain calories (chick peas, meat, etc). I follow the approach in the Clingendael report, citing FEWS NET, which assumes 70% of calories are from grain.
That gives some 23 trillion calories, but I don’t know what that means. So as a sanity check, I calculate how many days this would feed the population of the country (roughly 45-50 million) for a given daily caloric intake (say 2,000 or 2,200 calories per day). This step is not standard practice, but just to make sure the calculation is not orders of magnitude off. I get 215 to 262 days, depending on the details.
Next steps
As I mentioned, the ultimate goal is not to throw out rough approximations but to investigate each assumption thoroughly and systematically. Also, the last step is not normal practice, so in the next post I will instead consider an approach to modeling population estimates, how these calories are distributed (unevenly) across that population, body-mass index calculations, and finally estimated excess mortality. Finally, once the basics of the model are in place, I hope to turn the discussion to the political and conflict factors that might well determine the outcome as much as these technical details. Stay tuned!