Monash Time Series Forecasting Repository

The first repository containing datasets of related time series for global forecasting

Visit Repository

Our Aim


Our aim is to introduce the first comprehensive time series forecasting repository containing datasets of related time series to facilitate the evaluation of global forecasting models. All datasets are intended to use only for research purpose. Our repository contains 30 datasets including both publicly available time series datasets (in different formats) and datasets curated by us. Many datasets have different versions based on the frequency and the inclusion of missing values, making the total number of dataset variations to 58. Furthermore, it includes both real-world and competition time series datasets covering varied domains.

We have also characterised each dataset and executed several baseline methods with them.

We recommend you to read our paper for a detailed discussion of the datasets, their original sources, feature analysis and baseline evaluation. If you use our work, please cite the paper "Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I. Webb, Rob J. Hyndman, Pablo Montero-Manso, Monash Time Series Forecasting Archive".

            @InProceedings{godahewa2021monash,
              author = "Godahewa, Rakshitha and Bergmeir, Christoph and Webb, Geoffrey I. and Hyndman, Rob J. and Montero-Manso, Pablo",
              title = "Monash Time Series Forecasting Archive",
              booktitle = "Neural Information Processing Systems Track on Datasets and Benchmarks",
              year = "2021"
            }
          

Datasets


The following table shows a list of time series datasets that are currently available in our archive. The datasets are available in .tsf format which is a new format we propose to store time series data pioneered by sktime .ts format. The wrappers to load data into R and Python environments are available in our github repository.

Dataset Domain No: of Series Min. Length Max. Length Competition Multivariate Download Source
M1 Multiple 1001 15 150 Yes No Yearly
Quarterly
Monthly
Makridakis et al., 1982
M3 Multiple 3003 20 144 Yes No Yearly
Quarterly
Monthly
Other
Makridakis and Hibon, 2000
M4 Multiple 100000 19 9933 Yes No Yearly
Quarterly
Monthly
Weekly
Daily
Hourly
Makridakis et al., 2020
Tourism Tourism 1311 11 333 Yes No Yearly
Quarterly
Monthly
Athanasopoulos et al., 2011
CIF 2016 Banking 72 34 120 Yes No Monthly Stepnicka and Burda, 2017
London Smart Meters Energy 5560 288 39648 No No W Missing
W/O Missing
Jean-Michel, 2019
Aus. Electricity Demand Energy 5 230736 232272 No No Half Hourly
Curated by us
Wind Farms Energy 339 6345 527040 No No W Missing
W/O Missing
Curated by us
Dominick Sales 115704 28 393 No No Weekly James M. Kilts Center, 2020
Bitcoin Economic 18 2659 4581 No No W Missing
W/O Missing
Curated by us
Pedestrian Counts Transport 66 576 96424 No No Hourly City of Melbourne, 2020
Vehicle Trips Transport 329 70 243 No No W Missing
W/O Missing
fivethirtyeight, 2015
KDD Cup 2018 Nature 270 9504 10920 Yes No W Missing
W/O Missing
KDD Cup, 2018
Weather Nature 3010 1332 65981 No No Daily Sparks et al., 2020
NN5 Banking 111 791 791 Yes Yes Daily W Missing
Daily W/O Missing
Weekly
Ben Taieb et al., 2012
Web Traffic Web 145063 803 803 Yes Yes Daily W Missing
Daily W/O Missing
Weekly
Google, 2017
Solar Energy 137 52560 52560 No Yes 10 Minutes
Weekly
Solar, 2020
Electricity Energy 321 26304 26304 No Yes Hourly
Weekly
UCI, 2020
Car Parts Sales 2674 51 51 No Yes W Missing
W/O Missing
Hyndman, 2015
FRED-MD Economic 107 728 728 No Yes Monthly McCracken and Ng, 2016
San Francisco Traffic Transport 862 17544 17544 No Yes Hourly
Weekly
Caltrans, 2020
Rideshare Transport 2304 541 541 No Yes W Missing
W/O Missing
Curated by us
Hospital Health 767 84 84 No Yes Monthly Hyndman, 2015
COVID Deaths Nature 266 212 212 No Yes Daily Johns Hopkins University, 2020
Temperature Rain Nature 32072 725 725 No Yes W Missing
W/O Missing
Curated by us
Sunspot Nature 1 73931 73931 No No W Missing
W/O Missing
Sunspot, 2015
Saugeen River Flow Nature 1 23741 23741 No No Daily McLeod and Gweon, 2013
US Births Nature 1 7305 7305 No No Daily Pruim et al., 2020
Solar Power Energy 1 7397222 7397222 No No 4 Seconds Curated by us
Wind Power Energy 1 7397147 7397147 No No 4 Seconds Curated by us

Results


In our paper, we have evaluated the performance of 13 baseline forecasting methods across the datasets in our repository. The baseline methods include 6 traditional univariate forecasting models: Simple Exponential Smoothing (SES), Theta (Assimakopoulos and Nikolopoulos, 2000), Exponential Smoothing (ETS, Hyndman, 2008), Auto-Regressive Integrated Moving Average (ARIMA, Box and Jenkins, 1990), Trigonometric Box-Cox ARMA Trend Seasonal (TBATS, Livera et al., 2011) and Dynamic Harmonic Regression ARIMA (DHR-ARIMA, Hyndman and Athanasopoulos, 2021), and 7 global forecasting models: Pooled Regression (PR, Trapero et al., 2015), CatBoost (Prokhorenkova et al., 2018), Feed-Forward Neural Network (FFNN, Goodfellow et al., 2016), DeepAR (Salinas et al., 2020), N-BEATS (Oreshkin et al., 2019), WaveNet (Borovykh et al., 2017) and Transformer (Vaswani et al., 2017). Later, we have executed Prophet model as an additional baseline across all datasets in our repository. Furthermore, we have shown the results we obtained from the Informer model (Zhou et al, 2021) for some of the datasets. For evaluation, we use 4 error metrics namely the symmetric Mean Absolute Percentage Error (sMAPE), Mean Absolute Scaled Error (MASE, Hyndman and Koehler, 2006), Mean Absolute Error (MAE, Sammut and Webb, 2010), and Root Mean Squared Error (RMSE). The sMAPE error measure was calculated in 2 ways: the original version and the modified version suggested by Suilin (2017). For more details of these baseline methods and error metrics, please refer to our paper.

The results table shows the results of mean MASE of each baseline method across the datasets in our repository. The best model across each dataset is highlighted in boldface. We use 2 versions of ARIMA. The results of the general ARIMA method are reported for yearly, quarterly, monthly, and daily datasets whereas the results of DHR-ARIMA are reported for weekly datasets and multi-seasonal datasets such as 10 minutely, half hourly, and hourly.

The results of all error metrics across all baselines except the Prophet and Informer models are available in the online appendix. The results of all error metrics across the Prophet model are available here. The results of all error metrics across the Informer model, the reasons for selecting the datasets for the Informer model experiments and the considered Informer model configurations are available here.

We also expect to run new baselines in the future and the results tables will be updated accordingly. As new forecasting models emerge rapidly, we also provide a simple interface for you to implement other statistical, machine learning and deep learning baselines. Our github repository contains detailed instructions and example code snippets explaining how to integrate new forecasting models to our framework. The results of the newly integrated forecasting models are also evaluated in the same way as our baselines using the same evaluation metrics and thus, the results of new forecasting models and our baselines are directly comparable. After integrating the new forecasting models, you can send us a pull-request on github to officially integrate your implementations to our framework. You are also invited to send us the results of your new forecasting models. If computationally feasible, we expect to re-execute the models and confirm the results. In the future, we expect to maintain two results tables here with the confirmed and unconfirmed results of the forecasting models.


Dataset SES Theta TBATS ETS (DHR-)ARIMA PR CatBoost FFNN DeepAR N-BEATS WaveNet Transformer Prophet Informer*
M1 Yearly 4.938 4.191 3.499 3.771 4.479 4.588 4.427 4.355 4.603 4.384 4.666 5.519 5.633 -
M1 Quarterly 1.929 1.702 1.694 1.658 1.787 1.892 2.031 1.862 1.833 1.788 1.700 2.772 2.136 -
M1 Monthly 1.379 1.091 1.118 1.074 1.164 1.123 1.209 1.205 1.192 1.168 1.200 2.191 1.712 -
M3 Yearly 3.167 2.774 3.127 2.860 3.417 3.223 3.788 3.399 3.508 2.961 3.014 3.003 4.152 -
M3 Quarterly 1.417 1.117 1.256 1.170 1.240 1.248 1.441 1.329 1.310 1.182 1.290 2.452 1.672 -
M3 Monthly 1.091 0.864 0.861 0.865 0.873 1.010 1.065 1.011 1.167 0.934 1.008 1.454 1.375 -
M3 Other 3.089 2.271 1.848 1.814 1.831 2.655 3.178 2.615 2.975 2.390 2.127 2.781 4.694 -
M4 Yearly 3.981 3.375 3.437 3.444 3.876 3.625 3.649 - - - - - 5.256 -
M4 Quarterly 1.417 1.231 1.186 1.161 1.228 1.316 1.338 1.420 1.274 1.239 1.242 1.520 1.758 -
M4 Monthly 1.150 0.970 1.053 0.948 0.962 1.080 1.093 1.151 1.163 1.026 1.160 2.125 1.367 -
M4 Weekly 0.587 0.546 0.504 0.575 0.550 0.481 0.615 0.545 0.586 0.453 0.587 0.695 1.049 -
M4 Daily 1.154 1.153 1.157 1.239 1.179 1.162 1.593 1.141 2.212 1.218 1.157 1.377 3.698 -
M4 Hourly 11.607 11.524 2.663 26.690 13.557 1.662 1.771 2.862 2.145 2.247 1.680 8.840 1.776 -
Tourism Yearly 3.253 3.015 3.685 3.395 3.775 3.516 3.553 3.401 3.205 2.977 3.624 3.552 2.590 -
Tourism Quarterly 3.210 1.661 1.835 1.592 1.782 1.643 1.793 1.678 1.597 1.475 1.714 1.859 2.153 -
Tourism Monthly 3.306 1.649 1.751 1.526 1.589 1.678 1.699 1.582 1.409 1.574 1.482 1.571 2.008 -
CIF 2016 1.291 0.997 0.861 0.841 0.929 1.019 1.175 1.053 1.159 0.971 1.800 1.173 1.029 -
Aus. Elecdemand 1.857 1.867 1.174 5.663 2.574 0.780 0.705 1.222 1.591 1.014 1.102 1.113 1.414 -
Dominick 0.582 0.610 0.722 0.595 0.796 0.980 1.038 0.614 0.540 0.952 0.531 0.531 0.827 -
Bitcoin 4.327 4.344 4.611 2.718 4.030 2.664 2.888 6.006 6.394 7.254 5.315 8.462 11.089 -
Pedestrians 0.957 0.958 1.297 1.190 3.947 0.256 0.262 0.267 0.272 0.380 0.247 0.274 2.034 -
Vehicle Trips 1.224 1.244 1.860 1.305 1.282 1.212 1.176 1.843 1.929 2.143 1.851 2.532 2.428 -
KDD 1.645 1.646 1.394 1.787 1.982 1.265 1.233 1.228 1.699 1.600 1.185 1.696 1.186 -
Weather 0.677 0.749 0.689 0.702 0.746 3.046 0.762 0.638 0.631 0.717 0.721 0.650 0.880 -
NN5 Daily 1.521 0.885 0.858 0.865 1.013 1.263 0.973 0.941 0.919 1.134 0.916 0.958 0.883 0.933
NN5 Weekly 0.903 0.885 0.872 0.911 0.887 0.854 0.853 0.850 0.863 0.808 1.123 1.141 0.927 1.079
Kaggle Daily 0.924 0.928 0.947 1.231 0.890 - - - - - - - 1.785 -
Kaggle Weekly 0.698 0.694 0.622 0.770 0.815 1.021 1.928 0.689 0.758 0.667 0.628 0.888 1.196 -
Solar 10 Mins 1.451 1.452 3.936 1.451 1.034 1.451 2.504 1.450 1.450 1.573 - 1.451 1.821 1.614
Solar Weekly 1.215 1.224 0.916 1.134 0.848 1.053 1.530 1.045 0.725 1.184 1.961 0.574 1.508 2.408
Electricity Hourly 4.544 4.545 3.690 6.501 4.602 2.912 2.262 3.200 2.516 1.968 1.606 2.522 2.050 2.682
Electricity Weekly 1.536 1.476 0.792 1.526 0.878 0.916 0.815 0.769 1.005 0.800 1.250 1.770 0.924 1.444
Carparts 0.897 0.914 0.998 0.925 0.926 0.755 0.853 0.747 0.747 2.836 0.754 0.746 0.876 -
FRED-MD 0.617 0.698 0.502 0.468 0.533 8.827 0.947 0.601 0.640 0.604 0.806 1.823 1.843 17.839
Traffic Hourly 1.922 1.922 2.482 2.294 2.535 1.281 1.571 0.892 0.825 1.100 1.066 0.821 1.316 1.439
Traffic Weekly 1.116 1.121 1.148 1.125 1.191 1.122 1.116 1.150 1.182 1.094 1.233 1.555 1.084 1.323
Rideshare 3.014 3.641 3.067 4.040 1.530 3.019 2.908 4.198 4.029 3.877 3.009 4.040 4.666 -
Hospital 0.813 0.761 0.768 0.765 0.787 0.782 0.798 0.840 0.769 0.791 0.779 1.031 0.673 1.221
COVID 7.776 7.793 5.719 5.326 6.117 8.731 8.241 5.459 6.895 5.858 7.835 8.941 12.770 -
Temp. Rain 1.347 1.368 1.227 1.401 1.174 0.876 1.028 0.847 0.785 1.300 0.786 0.687 1.150 -
Sunspot 0.128 0.128 0.067 0.128 0.067 0.099 0.059 0.207 0.020 0.375 0.004 0.003 0.852 0.504
Saugeen 1.426 1.425 1.477 2.036 1.485 1.674 1.411 1.524 1.560 1.852 1.471 1.861 1.510 1.896
Births 4.343 2.138 1.453 1.529 1.917 2.094 1.609 2.032 1.548 1.537 1.837 1.650 5.626 2.220

*The results of the Informer model are only recorded for the datasets with equal-length series. For the datasets with unequal-length series, the Informer model is required to be executed per each series where the execution time is considerably high (for details, see here). Furthermore, the intermittent datasets such as Carparts, Rideshare, Web Traffic, Covid Deaths and Temperature Rain are not considered for the Informer experiments.

About Us


We are a group of time series researchers from Monash University and University of Sydney:

Contribute to Our Repository


We also encourage other researchers to contribute time series datasets to our repository either by directly uploading them into our repository and/or by contacting us via email.

If there are any copyright issues of the datasets, please contact us via email.

Acknowledgement


We are very grateful to the Department of Data Science and Artificial Intelligence of Monash University for their sponsorship.