The first repository containing datasets of related time series for global forecasting
Visit RepositoryOur aim is to introduce the first comprehensive time series forecasting repository containing datasets of related time series to facilitate the evaluation of global forecasting models. All datasets are intended to use only for research purpose. Our repository contains 30 datasets including both publicly available time series datasets (in different formats) and datasets curated by us. Many datasets have different versions based on the frequency and the inclusion of missing values, making the total number of dataset variations to 58. Furthermore, it includes both real-world and competition time series datasets covering varied domains.
We have also characterised each dataset and executed several baseline methods with them.
We recommend you to read our paper for a detailed discussion of the datasets, their original sources, feature analysis and baseline evaluation. If you use our work, please cite the paper "Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I. Webb, Rob J. Hyndman, Pablo Montero-Manso, Monash Time Series Forecasting Archive".
@InProceedings{godahewa2021monash, author = "Godahewa, Rakshitha and Bergmeir, Christoph and Webb, Geoffrey I. and Hyndman, Rob J. and Montero-Manso, Pablo", title = "Monash Time Series Forecasting Archive", booktitle = "Neural Information Processing Systems Track on Datasets and Benchmarks", year = "2021" }
The following table shows a list of time series datasets that are currently available in our archive. The datasets are available in .tsf format which is a new format we propose to store time series data pioneered by sktime .ts format. The wrappers to load data into R and Python environments are available in our github repository.
Dataset | Domain | No: of Series | Min. Length | Max. Length | Competition | Multivariate | Download | Source |
---|---|---|---|---|---|---|---|---|
M1 | Multiple | 1001 | 15 | 150 | Yes | No |
Yearly Quarterly Monthly |
Makridakis et al., 1982 |
M3 | Multiple | 3003 | 20 | 144 | Yes | No |
Yearly Quarterly Monthly Other |
Makridakis and Hibon, 2000 |
M4 | Multiple | 100000 | 19 | 9933 | Yes | No |
Yearly Quarterly Monthly Weekly Daily Hourly |
Makridakis et al., 2020 |
Tourism | Tourism | 1311 | 11 | 333 | Yes | No |
Yearly Quarterly Monthly |
Athanasopoulos et al., 2011 |
CIF 2016 | Banking | 72 | 34 | 120 | Yes | No | Monthly | Stepnicka and Burda, 2017 |
London Smart Meters | Energy | 5560 | 288 | 39648 | No | No |
W Missing W/O Missing |
Jean-Michel, 2019 |
Aus. Electricity Demand | Energy | 5 | 230736 | 232272 | No | No |
Half Hourly |
Curated by us |
Wind Farms | Energy | 339 | 6345 | 527040 | No | No |
W Missing W/O Missing |
Curated by us |
Dominick | Sales | 115704 | 28 | 393 | No | No | Weekly | James M. Kilts Center, 2020 |
Bitcoin | Economic | 18 | 2659 | 4581 | No | No |
W Missing W/O Missing |
Curated by us |
Pedestrian Counts | Transport | 66 | 576 | 96424 | No | No | Hourly | City of Melbourne, 2020 |
Vehicle Trips | Transport | 329 | 70 | 243 | No | No |
W Missing W/O Missing |
fivethirtyeight, 2015 |
KDD Cup 2018 | Nature | 270 | 9504 | 10920 | Yes | No |
W Missing W/O Missing |
KDD Cup, 2018 |
Weather | Nature | 3010 | 1332 | 65981 | No | No | Daily | Sparks et al., 2020 |
NN5 | Banking | 111 | 791 | 791 | Yes | Yes |
Daily W Missing Daily W/O Missing Weekly |
Ben Taieb et al., 2012 |
Web Traffic | Web | 145063 | 803 | 803 | Yes | Yes |
Daily W Missing Daily W/O Missing Weekly |
Google, 2017 |
Solar | Energy | 137 | 52560 | 52560 | No | Yes |
10 Minutes Weekly |
Solar, 2020 |
Electricity | Energy | 321 | 26304 | 26304 | No | Yes |
Hourly Weekly |
UCI, 2020 |
Car Parts | Sales | 2674 | 51 | 51 | No | Yes |
W Missing W/O Missing |
Hyndman, 2015 |
FRED-MD | Economic | 107 | 728 | 728 | No | Yes | Monthly | McCracken and Ng, 2016 |
San Francisco Traffic | Transport | 862 | 17544 | 17544 | No | Yes |
Hourly Weekly |
Caltrans, 2020 |
Rideshare | Transport | 2304 | 541 | 541 | No | Yes |
W Missing W/O Missing |
Curated by us |
Hospital | Health | 767 | 84 | 84 | No | Yes | Monthly | Hyndman, 2015 |
COVID Deaths | Nature | 266 | 212 | 212 | No | Yes | Daily | Johns Hopkins University, 2020 |
Temperature Rain | Nature | 32072 | 725 | 725 | No | Yes |
W Missing W/O Missing |
Curated by us |
Sunspot | Nature | 1 | 73931 | 73931 | No | No |
W Missing W/O Missing |
Sunspot, 2015 |
Saugeen River Flow | Nature | 1 | 23741 | 23741 | No | No | Daily | McLeod and Gweon, 2013 |
US Births | Nature | 1 | 7305 | 7305 | No | No | Daily | Pruim et al., 2020 |
Solar Power | Energy | 1 | 7397222 | 7397222 | No | No | 4 Seconds | Curated by us |
Wind Power | Energy | 1 | 7397147 | 7397147 | No | No | 4 Seconds | Curated by us |
In our paper, we have evaluated the performance of 13 baseline forecasting methods across the datasets in our repository. The baseline methods include 6 traditional univariate forecasting models: Simple Exponential Smoothing (SES), Theta (Assimakopoulos and Nikolopoulos, 2000), Exponential Smoothing (ETS, Hyndman, 2008), Auto-Regressive Integrated Moving Average (ARIMA, Box and Jenkins, 1990), Trigonometric Box-Cox ARMA Trend Seasonal (TBATS, Livera et al., 2011) and Dynamic Harmonic Regression ARIMA (DHR-ARIMA, Hyndman and Athanasopoulos, 2021), and 7 global forecasting models: Pooled Regression (PR, Trapero et al., 2015), CatBoost (Prokhorenkova et al., 2018), Feed-Forward Neural Network (FFNN, Goodfellow et al., 2016), DeepAR (Salinas et al., 2020), N-BEATS (Oreshkin et al., 2019), WaveNet (Borovykh et al., 2017) and Transformer (Vaswani et al., 2017). Later, we have executed Prophet model as an additional baseline across all datasets in our repository. Furthermore, we have shown the results we obtained from the Informer model (Zhou et al, 2021) for some of the datasets. For evaluation, we use 4 error metrics namely the symmetric Mean Absolute Percentage Error (sMAPE), Mean Absolute Scaled Error (MASE, Hyndman and Koehler, 2006), Mean Absolute Error (MAE, Sammut and Webb, 2010), and Root Mean Squared Error (RMSE). The sMAPE error measure was calculated in 2 ways: the original version and the modified version suggested by Suilin (2017). For more details of these baseline methods and error metrics, please refer to our paper.
The results table shows the results of mean MASE of each baseline method across the datasets in our repository. The best model across each dataset is highlighted in boldface. We use 2 versions of ARIMA. The results of the general ARIMA method are reported for yearly, quarterly, monthly, and daily datasets whereas the results of DHR-ARIMA are reported for weekly datasets and multi-seasonal datasets such as 10 minutely, half hourly, and hourly.
The results of all error metrics across all baselines except the Prophet and Informer models are available in the online appendix. The results of all error metrics across the Prophet model are available here. The results of all error metrics across the Informer model, the reasons for selecting the datasets for the Informer model experiments and the considered Informer model configurations are available here.
We also expect to run new baselines in the future and the results tables will be updated accordingly. As new forecasting models emerge rapidly, we also provide a simple interface for you to implement other statistical, machine learning and deep learning baselines. Our github repository contains detailed instructions and example code snippets explaining how to integrate new forecasting models to our framework. The results of the newly integrated forecasting models are also evaluated in the same way as our baselines using the same evaluation metrics and thus, the results of new forecasting models and our baselines are directly comparable. After integrating the new forecasting models, you can send us a pull-request on github to officially integrate your implementations to our framework. You are also invited to send us the results of your new forecasting models. If computationally feasible, we expect to re-execute the models and confirm the results. In the future, we expect to maintain two results tables here with the confirmed and unconfirmed results of the forecasting models.
Dataset | SES | Theta | TBATS | ETS | (DHR-)ARIMA | PR | CatBoost | FFNN | DeepAR | N-BEATS | WaveNet | Transformer | Prophet | Informer* |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
M1 Yearly | 4.938 | 4.191 | 3.499 | 3.771 | 4.479 | 4.588 | 4.427 | 4.355 | 4.603 | 4.384 | 4.666 | 5.519 | 5.633 | - |
M1 Quarterly | 1.929 | 1.702 | 1.694 | 1.658 | 1.787 | 1.892 | 2.031 | 1.862 | 1.833 | 1.788 | 1.700 | 2.772 | 2.136 | - |
M1 Monthly | 1.379 | 1.091 | 1.118 | 1.074 | 1.164 | 1.123 | 1.209 | 1.205 | 1.192 | 1.168 | 1.200 | 2.191 | 1.712 | - |
M3 Yearly | 3.167 | 2.774 | 3.127 | 2.860 | 3.417 | 3.223 | 3.788 | 3.399 | 3.508 | 2.961 | 3.014 | 3.003 | 4.152 | - |
M3 Quarterly | 1.417 | 1.117 | 1.256 | 1.170 | 1.240 | 1.248 | 1.441 | 1.329 | 1.310 | 1.182 | 1.290 | 2.452 | 1.672 | - |
M3 Monthly | 1.091 | 0.864 | 0.861 | 0.865 | 0.873 | 1.010 | 1.065 | 1.011 | 1.167 | 0.934 | 1.008 | 1.454 | 1.375 | - |
M3 Other | 3.089 | 2.271 | 1.848 | 1.814 | 1.831 | 2.655 | 3.178 | 2.615 | 2.975 | 2.390 | 2.127 | 2.781 | 4.694 | - |
M4 Yearly | 3.981 | 3.375 | 3.437 | 3.444 | 3.876 | 3.625 | 3.649 | - | - | - | - | - | 5.256 | - |
M4 Quarterly | 1.417 | 1.231 | 1.186 | 1.161 | 1.228 | 1.316 | 1.338 | 1.420 | 1.274 | 1.239 | 1.242 | 1.520 | 1.758 | - |
M4 Monthly | 1.150 | 0.970 | 1.053 | 0.948 | 0.962 | 1.080 | 1.093 | 1.151 | 1.163 | 1.026 | 1.160 | 2.125 | 1.367 | - |
M4 Weekly | 0.587 | 0.546 | 0.504 | 0.575 | 0.550 | 0.481 | 0.615 | 0.545 | 0.586 | 0.453 | 0.587 | 0.695 | 1.049 | - |
M4 Daily | 1.154 | 1.153 | 1.157 | 1.239 | 1.179 | 1.162 | 1.593 | 1.141 | 2.212 | 1.218 | 1.157 | 1.377 | 3.698 | - |
M4 Hourly | 11.607 | 11.524 | 2.663 | 26.690 | 13.557 | 1.662 | 1.771 | 2.862 | 2.145 | 2.247 | 1.680 | 8.840 | 1.776 | - |
Tourism Yearly | 3.253 | 3.015 | 3.685 | 3.395 | 3.775 | 3.516 | 3.553 | 3.401 | 3.205 | 2.977 | 3.624 | 3.552 | 2.590 | - |
Tourism Quarterly | 3.210 | 1.661 | 1.835 | 1.592 | 1.782 | 1.643 | 1.793 | 1.678 | 1.597 | 1.475 | 1.714 | 1.859 | 2.153 | - |
Tourism Monthly | 3.306 | 1.649 | 1.751 | 1.526 | 1.589 | 1.678 | 1.699 | 1.582 | 1.409 | 1.574 | 1.482 | 1.571 | 2.008 | - |
CIF 2016 | 1.291 | 0.997 | 0.861 | 0.841 | 0.929 | 1.019 | 1.175 | 1.053 | 1.159 | 0.971 | 1.800 | 1.173 | 1.029 | - |
Aus. Elecdemand | 1.857 | 1.867 | 1.174 | 5.663 | 2.574 | 0.780 | 0.705 | 1.222 | 1.591 | 1.014 | 1.102 | 1.113 | 1.414 | - |
Dominick | 0.582 | 0.610 | 0.722 | 0.595 | 0.796 | 0.980 | 1.038 | 0.614 | 0.540 | 0.952 | 0.531 | 0.531 | 0.827 | - |
Bitcoin | 4.327 | 4.344 | 4.611 | 2.718 | 4.030 | 2.664 | 2.888 | 6.006 | 6.394 | 7.254 | 5.315 | 8.462 | 11.089 | - |
Pedestrians | 0.957 | 0.958 | 1.297 | 1.190 | 3.947 | 0.256 | 0.262 | 0.267 | 0.272 | 0.380 | 0.247 | 0.274 | 2.034 | - |
Vehicle Trips | 1.224 | 1.244 | 1.860 | 1.305 | 1.282 | 1.212 | 1.176 | 1.843 | 1.929 | 2.143 | 1.851 | 2.532 | 2.428 | - |
KDD | 1.645 | 1.646 | 1.394 | 1.787 | 1.982 | 1.265 | 1.233 | 1.228 | 1.699 | 1.600 | 1.185 | 1.696 | 1.186 | - |
Weather | 0.677 | 0.749 | 0.689 | 0.702 | 0.746 | 3.046 | 0.762 | 0.638 | 0.631 | 0.717 | 0.721 | 0.650 | 0.880 | - |
NN5 Daily | 1.521 | 0.885 | 0.858 | 0.865 | 1.013 | 1.263 | 0.973 | 0.941 | 0.919 | 1.134 | 0.916 | 0.958 | 0.883 | 0.933 |
NN5 Weekly | 0.903 | 0.885 | 0.872 | 0.911 | 0.887 | 0.854 | 0.853 | 0.850 | 0.863 | 0.808 | 1.123 | 1.141 | 0.927 | 1.079 |
Kaggle Daily | 0.924 | 0.928 | 0.947 | 1.231 | 0.890 | - | - | - | - | - | - | - | 1.785 | - |
Kaggle Weekly | 0.698 | 0.694 | 0.622 | 0.770 | 0.815 | 1.021 | 1.928 | 0.689 | 0.758 | 0.667 | 0.628 | 0.888 | 1.196 | - |
Solar 10 Mins | 1.451 | 1.452 | 3.936 | 1.451 | 1.034 | 1.451 | 2.504 | 1.450 | 1.450 | 1.573 | - | 1.451 | 1.821 | 1.614 |
Solar Weekly | 1.215 | 1.224 | 0.916 | 1.134 | 0.848 | 1.053 | 1.530 | 1.045 | 0.725 | 1.184 | 1.961 | 0.574 | 1.508 | 2.408 |
Electricity Hourly | 4.544 | 4.545 | 3.690 | 6.501 | 4.602 | 2.912 | 2.262 | 3.200 | 2.516 | 1.968 | 1.606 | 2.522 | 2.050 | 2.682 |
Electricity Weekly | 1.536 | 1.476 | 0.792 | 1.526 | 0.878 | 0.916 | 0.815 | 0.769 | 1.005 | 0.800 | 1.250 | 1.770 | 0.924 | 1.444 |
Carparts | 0.897 | 0.914 | 0.998 | 0.925 | 0.926 | 0.755 | 0.853 | 0.747 | 0.747 | 2.836 | 0.754 | 0.746 | 0.876 | - |
FRED-MD | 0.617 | 0.698 | 0.502 | 0.468 | 0.533 | 8.827 | 0.947 | 0.601 | 0.640 | 0.604 | 0.806 | 1.823 | 1.843 | 17.839 |
Traffic Hourly | 1.922 | 1.922 | 2.482 | 2.294 | 2.535 | 1.281 | 1.571 | 0.892 | 0.825 | 1.100 | 1.066 | 0.821 | 1.316 | 1.439 |
Traffic Weekly | 1.116 | 1.121 | 1.148 | 1.125 | 1.191 | 1.122 | 1.116 | 1.150 | 1.182 | 1.094 | 1.233 | 1.555 | 1.084 | 1.323 |
Rideshare | 3.014 | 3.641 | 3.067 | 4.040 | 1.530 | 3.019 | 2.908 | 4.198 | 4.029 | 3.877 | 3.009 | 4.040 | 4.666 | - |
Hospital | 0.813 | 0.761 | 0.768 | 0.765 | 0.787 | 0.782 | 0.798 | 0.840 | 0.769 | 0.791 | 0.779 | 1.031 | 0.673 | 1.221 |
COVID | 7.776 | 7.793 | 5.719 | 5.326 | 6.117 | 8.731 | 8.241 | 5.459 | 6.895 | 5.858 | 7.835 | 8.941 | 12.770 | - |
Temp. Rain | 1.347 | 1.368 | 1.227 | 1.401 | 1.174 | 0.876 | 1.028 | 0.847 | 0.785 | 1.300 | 0.786 | 0.687 | 1.150 | - |
Sunspot | 0.128 | 0.128 | 0.067 | 0.128 | 0.067 | 0.099 | 0.059 | 0.207 | 0.020 | 0.375 | 0.004 | 0.003 | 0.852 | 0.504 |
Saugeen | 1.426 | 1.425 | 1.477 | 2.036 | 1.485 | 1.674 | 1.411 | 1.524 | 1.560 | 1.852 | 1.471 | 1.861 | 1.510 | 1.896 |
Births | 4.343 | 2.138 | 1.453 | 1.529 | 1.917 | 2.094 | 1.609 | 2.032 | 1.548 | 1.537 | 1.837 | 1.650 | 5.626 | 2.220 |
*The results of the Informer model are only recorded for the datasets with equal-length series. For the datasets with unequal-length series, the Informer model is required to be executed per each series where the execution time is considerably high (for details, see here). Furthermore, the intermittent datasets such as Carparts, Rideshare, Web Traffic, Covid Deaths and Temperature Rain are not considered for the Informer experiments.
We also encourage other researchers to contribute time series datasets to our repository either by directly uploading them into our repository and/or by contacting us via email.
If there are any copyright issues of the datasets, please contact us via email.
We are very grateful to the Department of Data Science and Artificial Intelligence of Monash University for their sponsorship.