Monash Time Series Forecasting Repository

The first repository containing datasets of related time series for global forecasting

Visit Repository

Our Aim


Our aim is to introduce the first comprehensive time series forecasting repository containing datasets of related time series to facilitate the evaluation of global forecasting models. All datasets are intended to use only for research purpose. Our repository contains 26 publicly available time series datasets, with both equal and variable lengths time series. Many datasets have different versions based on the frequency and the inclusion of missing values, making the total number of dataset variations to 50. Furthermore, it includes both real-world and competition time series datasets covering varied domains.

We have also characterised each dataset and executed several baseline methods with them.

We recommend you to read our paper for a detailed discussion of the datasets, their original sources, feature analysis and baseline evaluation. If you use our work, please cite the paper "Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I. Webb, Rob J. Hyndman, Pablo Montero-Manso, Monash Time Series Forecasting Archive".

            @misc{godahewa2021monash,
              author="Godahewa, Rakshitha and Bergmeir, Christoph and Webb, Geoffrey I. and Hyndman, Rob J. and Montero-Manso, Pablo",
              title="Monash Time Series Forecasting Archive",
              howpublished ="\url{https://arxiv.org/abs/2105.06643}",
              year="2021"
            }
          

Datasets


The following table shows a list of time series datasets that are currently available in our archive. The datasets are available in .tsf format which is a new format we propose to store time series data pioneered by sktime .ts format. The wrappers to load data into R and Python environments are available in our github repository.

Dataset Domain No: of Series Min. Length Max. Length Competition Download Source
M1 Multiple 1001 15 150 Yes Yearly
Quarterly
Monthly
Makridakis et al., 1982
M3 Multiple 3003 20 144 Yes Yearly
Quarterly
Monthly
Other
Makridakis and Hibon, 2000
M4 Multiple 100000 19 9933 Yes Yearly
Quarterly
Monthly
Weekly
Daily
Hourly
Makridakis et al., 2020
Tourism Tourism 1311 11 333 Yes Yearly
Quarterly
Monthly
Athanasopoulos et al., 2011
NN5 Banking 111 791 791 Yes Daily W Missing
Daily W/O Missing
Weekly
Taieb et al., 2012
CIF 2016 Banking 72 34 120 Yes Monthly Stepnicka and Burda, 2017
Web Traffic Web 145063 803 803 Yes Daily W Missing
Daily W/O Missing
Weekly
Google, 2017
Solar Energy 137 52560 52560 No 10 Minutes
Weekly
Solar, 2020
Electricity Energy 321 26304 26304 No Hourly
Weekly
UCI, 2020
London Smart Meters Energy 5560 288 39648 No W Missing
W/O Missing
Jean-Michel, 2019
Wind Farms Energy 339 6345 527040 No W Missing
W/O Missing
AEMO, 2020
Car Parts Sales 2674 51 51 No W Missing
W/O Missing
Hyndman, 2015
Dominick Sales 115704 28 393 No Weekly James M. Kilts Center, 2020
FRED-MD Economic 107 728 728 No Monthly McCracken and Ng, 2016
San Francisco Traffic Transport 862 17544 17544 No Hourly
Weekly
Caltrans, 2020
Pedestrian Counts Transport 66 576 96424 No Hourly City of Melbourne, 2020
Hospital Health 767 84 84 No Monthly Hyndman, 2015
COVID Deaths Nature 266 212 212 No Daily Johns Hopkins University, 2020
KDD Cup Nature 270 9504 10920 Yes W Missing
W/O Missing
KDD Cup, 2018
Weather Nature 3010 1332 65981 No Daily Sparks et al., 2020
Sunspot Nature 1 73931 73931 No W Missing
W/O Missing
Sunspot, 2015
Saugeen River Flow Nature 1 23741 23741 No Daily McLeod and Gweon, 2013
US Births Nature 1 7305 7305 No Daily Pruim et al., 2020
Electricity Demand Energy 1 17520 17520 No Half Hourly Hyndman, 2018
Solar Power Energy 1 7397222 7397222 No 4 Seconds AEMO, 2020
Wind Power Energy 1 7397147 7397147 No 4 Seconds AEMO, 2020

Results


We have evaluated the performance of 7 baseline forecasting methods across the datasets in our repository. The baseline methods include 6 traditional univariate forecasting models: Simple Exponential Smoothing (SES), Theta (Assimakopoulos and Nikolopoulos, 2000), Exponential Smoothing (ETS, Hyndman, 2008), Auto-Regressive Integrated Moving Average (ARIMA, Box and Jenkins, 1990), Trigonometric Box-Cox ARMA Trend Seasonal (TBATS, Livera et al., 2011) and Dynamic Harmonic Regression ARIMA (DHR-ARIMA, Hyndman and Athanasopoulos, 2021), and a globally trained Pooled Regression model (PR, Trapero et al., 2015). For evaluation, we use 4 error metrics namely the symmetric Mean Absolute Percentage Error (sMAPE), Mean Absolute Scaled Error (MASE, Hyndman and Koehler, 2006), Mean Absolute Error (MAE, Sammut and Webb, 2010), and Root Mean Squared Error (RMSE). The sMAPE error measure was calculated in 2 ways: the original version and the modified version suggested by Suilin (2017).

Please refer to our paper for more details of these baseline methods and error metrics.

The following table shows the results of mean MASE of each baseline method across the datasets in our repository. The results of other error metrics are available in the online appendix.

Dataset SES Theta ETS ARIMA TBATS DHR-ARIMA PR
NN5 Daily 1.521 0.885 0.865 1.013 - - 1.263
NN5 Weekly 0.903 0.885 - - 0.872 0.887 0.854
CIF 2016 1.291 0.997 0.841 0.929 - - 1.019
US Births 4.343 2.138 1.529 1.917 - - 2.094
Saugeen River Flow 1.426 1.425 2.036 1.485 - - 1.674
Elecdemand 1.126 1.125 - - 1.272 0.902 1.153
Kaggle Daily 0.924 0.928 1.231 0.890 - - -
Kaggle Weekly 0.698 0.694 - - 0.622 0.815 1.021
Tourism Yearly 3.253 3.015 3.395 3.775 - - 3.516
Tourism Quarterly 3.210 1.661 1.592 1.782 - - 1.643
Tourism Monthly 3.306 1.649 1.526 1.589 - - 1.678
Traffic Hourly 1.922 1.922 - - 2.482 2.535 1.281
Traffic Weekly 1.116 1.121 - - 1.148 1.191 1.122
Electricity Hourly 4.544 4.545 - - 3.690 4.602 2.912
Electricity Weekly 1.536 1.476 - - 0.792 0.878 0.916
Solar 10 Minutes 1.451 1.452 - - 3.936 1.034 1.451
Solar Weekly 1.215 1.224 - - 0.916 0.848 1.053
Sunspot 0.128 0.128 0.128 0.067 - - 0.099
M1 Yearly 4.938 4.191 3.771 4.479 - - 4.588
M1 Quarterly 1.929 1.702 1.658 1.787 - - 1.892
M1 Monthly 1.379 1.091 1.074 1.164 - - 1.123
M3 Yearly 3.167 2.774 2.860 3.417 - - 3.223
M3 Quarterly 1.417 1.117 1.170 1.240 - - 1.248
M3 Monthly 1.091 0.864 0.865 0.873 - - 1.010
M3 Other 3.089 2.271 1.814 1.831 - - 2.655
M4 Yearly 3.981 3.375 3.444 3.876 - - 3.625
M4 Quarterly 1.417 1.231 1.161 1.228 - - 1.316
M4 Monthly 1.150 0.970 0.948 0.962 - - 1.080
M4 Weekly 0.587 0.546 - - 0.504 0.550 0.481
M4 Daily 1.154 1.153 1.239 1.179 - - 1.162
M4 Hourly 11.607 11.524 - - 2.663 13.557 1.662
Pedestrian Counts 0.957 0.958 - - 1.297 3.947 0.256
KDD Cup 1.645 1.646 - - 1.394 1.982 1.265
Carparts 0.897 0.914 0.925 0.926 - - 0.755
Hospital 0.813 0.761 0.765 0.787 - - 0.782
Covid Deaths 7.776 7.793 5.326 6.117 - - 8.731
FRED-MD 0.617 0.698 0.468 0.533 - - 8.827
Dominick 0.582 0.610 - - 72721475.060 0.796 0.980
Weather 0.677 0.749 0.702 0.746 - - 3.046

About Us


We are a group of time series researchers from Monash University and University of Sydney:

Contribute to Our Repository


We also encourage other researchers to contribute time series datasets to our repository either by directly uploading them into our repository and/or by contacting us via email.

If there are any copyright issues of the datasets, please contact us via email.