Time-Series Clustering Benchmark on Regional Economic Indicator

Yudhistira Dharma Putra

Abstract


This paper presents a benchmark study on time-series clustering using regional economic data from the World Bank Open Data (WBOD) repository. It serves as a paradigm for future researchers. This study compares the effectiveness of twenty different techniques for time series grouping. They combine three clustering algorithms (partitional, hierarchical, and fuzzy), two centroids (K-means and K-medoids), and four distance measurements (distance between two points on a graph) (Dynamic time warping, Euclidean, shape-based distance, and global triagonal kernel alignment). The internal clustering validation index will be used to compare the performance of various techniques. Additionally, statistical tests are run on the performance of the pair of approaches to establish whether they can be compared. Across all clustering algorithms evaluated, it was discovered that utilizing K-means as centroids outperformed using K-medoids. When it comes to distance measurements, all clustering algorithms perform optimally, but the Triagonal Global Alignment Kernel is the best of these (except for the fuzzy C-means). Another conclusion reached in this study is that no solution utilizing Dynamic Time Warping and Euclidean distance measures can be compared to another (insignificant Wilcoxon test result). Simultaneously, Shape-Based Distance consistently beats all other approaches to clustering in terms of consistency.


Keywords


time-series clustering; benchmark; unsupervised learning; regional economy

Full Text:

PDF

References


Aghabozorgi, S., Shirkhorshidi, A. S., and Wah, T. Y. (2015). Time-series clustering – A decade review. Information Systems, vol. 53, pp. 16 38.

Chen, L., and Wan, S. (2021). Intelligent fault diagnosis of high-voltage circuit breakers using triangular global alignment kernel extreme learning machine, ISA Transactions, vol. 109, pp. 368–379.

Du, S. Wu, M., Chen, L., Cao, W., and Pedrycz, W. (2020). Operating mode recognition of iron ore sintering process based on the clustering of time series data. Control Engineering Practice, vol. 96, p. 104297.

Esmaili, N., Buchlak, Q. D., Piccardi, M., Kruger, B., and Girosi, F. (2021). Multichannel mixture models for time-series analysis and classification of engagement with multiple health services: An application to psychology and physiotherapy utilization patterns after traffic accidents. Artificial Intelligence in Medicine, vol. 111, p. 101997.

Feng, X., Zhang, X., and Xiang, Y. (2020). An inconsistency assessment method for backup battery packs based on time-series clustering. Journal of Energy Storage, vol. 31, p. 101666.

Franses, P. H., and Wiemann, T. (2020). Intertemporal similarity of economic time series: An application of Dynamic Time Warping. Computational Economics, vol. 56, no. 1, pp. 59 75.

Gorbatiuk, K., Mantalyuk, O., Proskurovych, O., and Valkov, O. (2019). Analysis of regional development disparities in Ukraine with fuzzy clustering technique. SHS Web of Conferences, vol. 65, p. 04008.

Graves, D., and Pedrycz, W. (2010). Proximity fuzzy clustering and its application to time series clustering and prediction. 2010 10th International Conference on Intelligent Systems Design and Applications.

Großwendt, A., Röglin, H., and Schmidt, M. (2019) Analysis of Ward’s method, in Proc. 30th Annual ACM-SIAM Symposium on Discrete Algorithms, San Diego, PA, USA, pp. 2939-2957.

Hu, G., and Du, Z. (2019). Adaptive kernel-based fuzzy C-means clustering with spatial constraints for image segmentation. International Journal of Pattern Recognition and Artificial Intelligence, vol. 33, no. 01, p. 1954003.

Javed, A., Lee, B. S., and Rizzo, D. M. (2020). A benchmark study on time series clustering. Machine Learning with Applications, vol. 1, p. 100001.

Johnpaul, C., et al. (2020). Trendlets: A novel probabilistic representational structures for clustering the time series data. Expert Systems with Applications, vol. 145, p. 113119.

Leverger, C., Malinowski, S., Guyet, T., Lemaire, V., Bondu, A., and Termier, A. (2019). Toward a framework for seasonal time series forecasting using clustering. Intelligent Data Engineering and Automated Learning – IDEAL 2019, pp. 328–340.

Li, D.-D. and Wang, Z.-X. (2022). Measurement methods for relative index of Financial Inclusion. Applied Economics Letters, pp. 1–7.

Magdalena, S., Suhatman, R. (2020). The Effect of Government Expenditures, Domestic Invesment, Foreign Invesment to the Economic Growth of Primary Sector in Central Kalimantan. Budapest International Research and Critics Institute-Journal (BIRCI-Journal). Volume 3, No 3, Page: 1692-1703.

Malik, A., and Tuckfield, B. (2019). Introduction to Clustering Methods, in Applied unsupervised learning with r: Uncover hidden relationships and patterns with K-means clustering, and PCA, Birmingham, UK: Packt, pp. 1–49.

Niennattrakul, V., and Ratanamahatana, C. A. (2007) On clustering multimedia time series data using K-means and dynamic time warping. 2007 International Conference on Multimedia and Ubiquitous Engineering (MUE'07).

Özkoç, E. E. (2020). Clustering of Time-Series Data, in Data Mining – Methods. Applications and Systems, 1st ed. London, United Kingdom: IntechOpen, pp. 1-19. [Online]. Available: https://www.intechopen.com/books

Paparrizos, J. and Gravano, L. (2016). K-shape. ACM SIGMOD Record, vol. 45, no. 1, pp. 69–76.

Paparrizos, J., and Gravano, L. (2017) Fast and accurate time-series clustering. ACM Transactions on Database Systems, vol. 42, no. 2, pp. 1–49.

Putri, R. A., Rustam, Z., and Pandelaki, J. (2019). Kernel based fuzzy C-means clustering for chronic sinusitis classification. IOP Conference Series: Materials Science and Engineering, vol. 546, no. 5, p. 052060.

Rahman, M. A., Zaman, N., Asyhari, A. T., Al-Turjman, F., Bhuiyan, M. Z. A., and Zolkipli, M. F. (2020) Data-driven dynamic clustering framework for mitigating the adverse economic impact of covid-19 lockdown practices. Sustainable Cities and Society, vol. 62, p. 102372.

Sardá-Espinosa, A. (2019). Time-series clustering in R using the DTWCLUST package. The R Journal, vol. 11, no. 1, p. 22.

Steinmann, P., Auping, W. L., and Kwakkel, J. H. (2020). Behavior-based scenario discovery using time series clustering. Technological Forecasting and Social Change, vol. 156, p. 120052.

Wang, H., Zhou, B., Zhang, J., and Cheng, R. (2020). A novel density peaks clustering algorithm based on local reachability density. International Journal of Computational Intelligence Systems, vol. 13, no. 1, p. 690.

X. Yu and S. Xiong, A dynamic time warping based algorithm to evaluate Kinect-enabled home-based physical rehabilitation exercises for older people. Sensors, vol. 19, no. 13, p. 2882, 2019.




DOI: https://doi.org/10.33258/birci.v5i1.4374

Article Metrics

Abstract view : 288 times
PDF - 41 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

 

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.