/ excess-deaths / README.md
README.md
  1  # Excess Deaths During the Coronavirus Pandemic
  2  
  3  The New York Times is releasing data that documents the number of deaths from all causes that have occurred during the coronavirus pandemic for 28 countries. We are compiling this time series data from national and municipal health departments, vital statistics offices and other official sources in order to better understand the true toll of the pandemic and provide a record for researchers and the public.
  4  
  5  Official Covid-19 death tolls offer a limited view of the impact of the outbreak because they often exclude people who have not been tested and those who died at home. All-cause mortality is widely used by demographers and other researchers to understand the full impact of deadly events, including epidemics, wars and natural disasters. The totals in this data include deaths from Covid-19 as well as those from other causes, likely including people who could not be treated or did not seek treatment for other conditions. 
  6  
  7  We have used this data to produce [graphics tracking](https://www.nytimes.com/interactive/2020/04/21/world/coronavirus-missing-deaths.html) [the oubreak’s toll](https://www.nytimes.com/interactive/2020/06/10/world/coronavirus-history.html) and stories about [the United States](https://www.nytimes.com/interactive/2020/05/05/us/coronavirus-death-toll-us.html), [Ecuador](https://www.nytimes.com/2020/04/23/world/americas/ecuador-deaths-coronavirus.html), [Russia](https://www.nytimes.com/2020/05/11/world/europe/coronavirus-deaths-moscow.html), [Turkey](https://www.nytimes.com/2020/04/20/world/middleeast/coronavirus-turkey-deaths.html), [Sweden](https://www.nytimes.com/interactive/2020/05/15/world/europe/sweden-coronavirus-deaths.html) and [other countries](https://www.nytimes.com/2020/05/12/world/americas/latin-america-virus-death.html). We would like to thank a number of demographers and other researchers, listed at the end, who have provided data or helped interpret it.
  8  
  9  ## Country and City-Level Data
 10  
 11  The number of all-cause deaths recorded in each area, by week or month, can be found in the **[deaths.csv](deaths.csv)** file. ([Raw CSV](https://raw.githubusercontent.com/nytimes/covid-19-data/master/excess-deaths/deaths.csv)) For weekly data, the first and last weeks of the year, which are often partial weeks, were excluded.
 12  
 13  ```
 14  country,placename,frequency,start_date,end_date,year,month,week,deaths,expected_deaths,excess_deaths,baseline
 15  France,,weekly,2020-04-27,2020-05-03,2020,4,18,10498,10357,141,2010-2018 weekly average
 16  ```
 17  
 18  Some of the data is only available at the city level.
 19  
 20  ```
 21  country,placename,frequency,start_date,end_date,year,month,week,deaths,expected_deaths,excess_deaths,baseline
 22  Turkey,Istanbul,weekly,2020-04-06,2020-04-12,2020,4,15,2193,1429,764,2018-2019 weekly average
 23  ```
 24  
 25  
 26  The deaths fields have the following definitions:
 27  
 28  **deaths**: The total number of confirmed deaths recorded from any cause.  
 29  **expected_deaths**: The baseline number of expected deaths, calculated from a historical average. See [expected deaths](#expected-deaths).  
 30  **excess_deaths**: The number of deaths minus the expected deaths.  
 31  
 32  The time fields have the following definitions:
 33  
 34  **frequency**: Weekly or monthly, depending on how the data is recorded.  
 35  **start_date**: The first date included in the period.  
 36  **end_date**: The last date included in the period.  
 37  **month**: Numerical month.  
 38  **week**: Epidemiological week, which is a standardized way of counting weeks to allow for year-over-year comparisons. Most countries start epi weeks on Mondays, but others vary.  
 39  **baseline**: The years used to calculate expected_deaths.  
 40  
 41  ## Methodology
 42  
 43  The data is the product of journalists in a number of countries who monitor official data releases and ask government officials for information. We have consulted with demographers, medical officials and local sources to confirm that this data is broadly representative of how many people have died. In some countries, the number of burials, hospital deaths or other factors are used to confirm that the underlying trends are representative.
 44  
 45  But mortality data in the middle of a pandemic is not perfect. Many countries have not yet published any data on all-cause mortality. And during a pandemic, normal patterns of death registration may be disrupted, which could lead to changes in how many deaths are captured. 
 46  
 47  Most of the countries in this dataset have widespread vital statistics coverage. But many low-income countries have [unreliable death registration systems](https://twitter.com/helleringer143/status/1261868447903948800), making it very difficult to assess their levels of excess mortality. A rough guide to the historical completeness of death registration systems by country is available from the United Nations:
 48  https://unstats.un.org/unsd/demographic-social/crvs/documents/Website_final_coverage.xls
 49  
 50  Some countries are publishing mortality data faster than normal in order to understand how mortality is changing. That means data, especially for recent time periods, may be revised. It is usually revised upwards as more deaths are reported.
 51  
 52  Expected deaths  for [the United States](https://www.nytimes.com/interactive/2020/05/05/us/coronavirus-death-toll-us.html) were calculated with a simple model based on the number of all-cause deaths from 2015 to 2019 released by the Centers for Disease Control and Prevention, adjusted to account for trends, like population changes, over time.
 53  
 54  Our analysis aims to show mortality statistics for as much of the country as possible, but it is limited to those states where mortality data is sufficiently complete.
 55  
 56  Some states are so far behind in submitting death certificates to the C.D.C. that the C.D.C. does not recommend relying on their recent death reporting. In Pennsylvania and Ohio, for example, death reporting seems to be lagging far behind the normal rate all year, according to the C.D.C., even though their reporting is usually more timely, so we have excluded data from those states, in addition to Alaska, Connecticut, Louisiana, North Carolina, Puerto Rico, Rhode Island and West Virginia.
 57  
 58  See [Data Sources](#data-sources) below for the source of data for each country and city in this dataset.
 59  
 60  ## Expected Deaths
 61  
 62  We have calculated an average number of expected deaths for each area based on historical data for the same time of year. These expected deaths are the basis for our [excess death calculations](https://www.nytimes.com/interactive/2020/04/21/world/coronavirus-missing-deaths.html), which estimate how many more people have died this year than in an average year.
 63  
 64  To estimate expected deaths, we fit a linear model to the reported deaths in each country from earlier years to January 2020. The model has two components — a linear time trend to account for demographic changes and a smoothing spline to account for seasonal variation. For countries limited to monthly data, the model includes month as a fixed effect rather than using a smoothing spline. 
 65  
 66  The number of expected deaths are not adjusted for how non-Covid-19 deaths may change during the outbreak, which will take some time to figure out. As countries impose control measures, deaths from causes like road accidents and homicides may decline. And people who die from Covid-19 cannot die later from [other causes](https://twitter.com/AndrewNoymer/status/1241620305350549504), which may reduce other causes of death. Both of these factors, if they play a role, would lead these baselines to understate, rather than overstate, the number of excess deaths.
 67  
 68  The number of years used in the expected deaths calculation changes depending on what data is available. See Data Sources for the years used to calculate the baselines. 
 69  
 70  
 71  ## Data Sources
 72  
 73  **Austria**
 74  
 75  Source: [Statistics Austria](http://www.statistik.at/web_de/statistiken/menschen_und_gesellschaft/bevoelkerung/gestorbene/index.html)  
 76  Baseline years: 2015-2019  
 77  Data frequency: weekly  
 78  
 79  **Belgium**
 80  
 81  Source: Sciensano publishes a [weekly report](https://covid-19.sciensano.be/fr/covid-19-situation-epidemiologique). More historical mortality data is from the [Belgian Mortality Monitoring](https://epistat.wiv-isp.be/momo/) dashboard.  
 82  Baseline years: 2016-2019  
 83  Data frequency: weekly  
 84  
 85  **Brazil**
 86  
 87  Source: Data for six cities in Brazil — São Paulo, Rio de Janeiro, Fortaleza, Manaus, Recife and Belem — is from the [Registro Civil](https://registrocivil.org.br/) and the Ministry of Health.  
 88  Baseline years: 2016-2019  
 89  Data frequency: monthly 
 90  
 91  **Denmark**
 92  
 93  Source: [Statistics Denmark](https://www.statbank.dk/dodc2)  
 94  Baseline years: 2015-2019  
 95  Data frequency: weekly  
 96  
 97  **Ecuador**
 98  
 99  Source: [General Direction of Civil Registry](https://www.registrocivil.gob.ec/cifras/)  
100  Baseline years: 2017-2019. 2019 data is only available for Jan.-April.  
101  Data frequency: monthly  
102  
103  **Finland**
104  
105  Source: [Statistics Finland](https://pxnet2.stat.fi/PXWeb/pxweb/en/Kokeelliset_tilastot/Kokeelliset_tilastot__vamuu_koke/statfin_vamuu_pxt_12ng.px/)  
106  Baseline years: 2015-2019  
107  Data frequency: weekly  
108  
109  **France**
110  
111  Source: INSEE (2018-2020 data can be found [here](https://www.insee.fr/fr/statistiques/4487988?sommaire=4487854))  
112  Baseline years: 2010-2019  
113  Data frequency: weekly  
114  
115  **Germany**
116  
117  Source: [Federal Statistics Office](https://www.destatis.de/EN/Themes/Society-Environment/Population/Deaths-Life-Expectancy/_node.html;jsessionid=91286BFEECCABAD3052B72D2C2760F99.internet8732)  
118  Baseline years: 2016-2019  
119  Data frequency: weekly  
120  
121  **Jakarta, Indonesia**
122  
123  Source: [Jakarta’s Department of Parks and Cemeteries](https://pertamananpemakaman.jakarta.go.id/v140/t15)  
124  Baseline years: 2010-2019  
125  Data frequency: monthly burials  
126  
127  **Israel**
128  
129  Source: [Population and Immigration Authority](https://www.gov.il/BlobFolder/news/death_stats_2001_2020/he/death_stats_2001_2020.pdf)  
130  Baseline years: 2015-2019  
131  Data frequency: monthly  
132  
133  **Italy**
134  
135  Source: [The Italian National Institute of Statistics](https://www.istat.it/en/archivio/240106)  
136  Baseline years: 2015-2019 monthly average. Historical data is only available as a four-year average from January 1 through March 31.  
137  Data frequency: monthly  
138  
139  **Netherlands**
140  
141  Source: [Statistics Netherlands](https://opendata.cbs.nl/#/CBS/en/dataset/70895ENG/table?ts=1588591754264)  
142  Baseline years: 2016-2019  
143  Data frequency: weekly  
144  
145  **Norway**
146  
147  Source: [Statistics Norway](https://www.ssb.no/statbank/table/07995/)  
148  Baseline years: 2015-2019  
149  Data frequency: weekly
150  
151  **Mexico City, Mexico**
152  
153  Source: [Death certificate records in 2019 and 2020 via General Directorate of the Civil Registry](http://www.rcivil.cdmx.gob.mx/solicitudactas/busqueda/registrales?clase_acta=DEFUNCION)[, collected by Mario Romero and Laurianne Despeghel](https://github.com/mariorz/folio-deceso); 2016-2018 mortality from National Institute of Statistics, Geography and Informatics (INEGI)  
154  Baseline years: 2016-2019  
155  Data frequency: weekly    
156  
157  **Peru**
158  
159  Source: [Mortality Information System](https://www.minsa.gob.pe/defunciones/) (Sinadef) for 2017-2020; Health Ministry for 2016.  
160  Baseline years: 2017-2019  
161  Data frequency: monthly  
162  
163  **Portugal**
164  
165  Source: Eurostat  
166  Baseline years: 2015-2019  
167  Data frequency: weekly  
168  
169  **Moscow, Russia**
170  
171  Source: [Moscow City Government](https://data.mos.ru/opendata/7704111479-dinamika-registratsii-aktov-grajdanskogo-sostoyaniya?pageNumber=13&versionNumber=3&releaseNumber=42&fbclid=IwAR23dK1YBLeGipw4UPg4hi_w6cDOE94fuZ0Z7lwx28u-rAZCEoqAAaIQpF8)  
172  Baseline years: 2015-2019  
173  Data frequency: monthly
174  
175  **South Korea**
176  
177  Source: [Statistics Korea](http://kosis.kr/statisticsList/statisticsListIndex.do?menuId=M_01_01&vwcd=MT_ZTITLE&parmTabId=M_01_01#SelectStatsBoxDiv)  
178  Baseline years: 2015-2019  
179  Data frequency: monthly    
180  
181  **Sweden**
182  
183  Source: [Statistics Sweden](https://www.scb.se/en/About-us/news-and-press-releases/statistics-sweden-to-publish-preliminary-statistics-on-deaths-in-sweden/)  
184  Baseline years: 2015-2019  
185  Data frequency: weekly  
186  
187  **Switzerland**
188  
189  Source: [Federal Statistics Bureau](https://www.bfs.admin.ch/bfs/fr/home/statistiques/sante/etat-sante/mortalite-causes-deces.html)  
190  Baseline years: 2016-2019  
191  Data frequency: weekly  
192  
193  
194  **Thailand**
195  
196  Sources: [Bureau of Registration Administration](https://www.cdg.co.th/website/en/industries/government/civil-registration-and-the-national-identification-card-system-the-bureau-of-registration-administration-the-department-of-provincial-administration-2/) [Department of Provincial Administration](https://www.dopa.go.th/main/web_index)  
197  Baseline years: 2015-2019  
198  Data frequency: monthly
199  
200  **United Kingdom**
201  
202  Sources: [Office for National Statistics](https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/datasets/weeklyprovisionalfiguresondeathsregisteredinenglandandwales); [National Records of Scotland](https://www.nrscotland.gov.uk/covid19stats); [Northern Ireland Statistics and Research Agency](https://www.nisra.gov.uk/publications/weekly-deaths).  
203  Baseline years: 2010-2019  
204  Data frequency: weekly  
205  
206  
207  **United States**
208  
209  Source: [Centers for Disease Control and Prevention](https://www.cdc.gov/nchs/nvss/vsrr/covid_weekly/)  
210  Baseline years: 2015-2019  
211  Data frequency: weekly    
212  
213  
214  **Boston, United States**
215  
216  Source: [Massachusetts Department of Public Health](https://www.mass.gov/lists/death-data)  
217  Baseline years: 2015-2019  
218  Data frequency: weekly  
219  
220  
221  **Chicago, United States**
222  
223  Source: [Illinois Department of Public Health](https://www.dph.illinois.gov/data-statistics/vital-statistics/death-statistics)  
224  Baseline years: 2017-2019  
225  Data frequency: weekly  
226  
227  
228  **Denver, United States**
229  
230  Source: [Colorado Department of Public Health and Environment](https://www.colorado.gov/pacific/cdphe/vital-statistics-program)  
231  Baseline years: 2017-2019  
232  Data frequency: monthly  
233  
234  
235  **Detroit, United States**
236  
237  Source: [Michigan Department of Health and Human Services](https://www.michigan.gov/mdhhs/0,5885,7-339-73970_2944_4669_4686---,00.html)  
238  Baseline years: 2017-2019  
239  Data frequency: weekly  
240  
241  
242  **Miami, United States**
243  
244  Source: [Florida Department of Health](http://www.floridahealth.gov/statistics-and-data/index.html)  
245  Baseline years: 2015-2019  
246  Data frequency: monthly  
247  
248  
249  **New York City, United States**
250  
251  Source: [Centers for Disease Control and Prevention](https://gis.cdc.gov/grasp/fluview/mortality.html)  
252  Baseline years: 2015-2019  
253  Data frequency: weekly  
254  
255  
256  
257  
258  ## Other Collections of All-Cause Mortality Data
259  
260  [The Human Mortality Database](https://www.mortality.org/) includes recent all-cause deaths collected by demographers at the Max Planck Institute for Demographic Research and other institutions. [The Economist](https://github.com/TheEconomist/covid-19-excess-deaths-tracker) and the [Financial Times](https://github.com/Financial-Times/coronavirus-excess-mortality-data) are also publicly releasing their data on all-cause mortality.
261  
262  ## License and Attribution
263  
264  This data is licensed under the same terms as our Coronavirus Data in the United States data. In general, we are making this data publicly available for broad, noncommercial public use including by medical and public health researchers, policymakers, analysts and local news media.
265  
266  If you use this data, you must attribute it to “The New York Times” in any publication. If you would like a more expanded description of the data, you could say “Data from The New York Times, based on reports from national and municipal health agencies.”
267  
268  If you use it in an online presentation, we would appreciate it if you would link to our graphic tracking  these deaths [https://www.nytimes.com/interactive/2020/04/21/world/coronavirus-missing-deaths.html](https://www.nytimes.com/interactive/2020/04/21/world/coronavirus-missing-deaths.html).
269  
270  If you use this data, please let us know at covid-data@nytimes.com.
271  
272  See our [LICENSE](LICENSE) for the full terms of use for this data.
273  
274  ## Contact Us
275  
276  If you have questions about the data or licensing conditions, please contact us at:
277  
278  covid-data@nytimes.com
279  
280  
281  ## Contributors
282  
283  Allison McCann, Jin Wu, Josh Katz and Denise Lu have been leading our data collection efforts. 
284  
285  Elian Peltier contributed reporting from Paris, Muktita Suhartono from Bangkok, Carlotta Gall from Istanbul, Anatoly Kurmanaev from Caracas, Venezuela, Monika Pronczuk from Brussels, José María León Cabrera from Quito, Ecuador, Irit Pazner from Jerusalem, Mirelis Morales from Lima and Manuela Andreoni from Rio de Janeiro.
286  
287  Thank you to Stéphane Helleringer, Johns Hopkins University; Tim Riffe, Max Planck Institute for Demographic Research; Lasse Skafte Vestergaard, EuroMOMO; Vladimir Shkolnikov, Max Planck Institute for Demographic Research; Jenny Garcia, Institut National d'Études Démographiques; Tom Moultrie, University of Cape Town; Isaac Sasson, Tel Aviv University; Patrick Gerland, United Nations; S V Subramanian, Harvard University; Paulo Lotufo, University of São Paulo; and Marcelo Oliveira.