Extractive Summarization in Low-Resource Languages: A Systematic Review
##semicolon##
https://doi.org/10.59188/eduvest.v6i2.52390##semicolon##
prisma##common.commaListSeparator## extravtive summarization##common.commaListSeparator## low-resource languages##common.commaListSeparator## natural language processingAbstrakt
NLP advancements have accelerated Automatic Text Summarization research, but development remains skewed toward high-resource languages. Low-resource languages are underrepresented due to limited digital corpora, scarce linguistic tools, and a lack of locally suitable pre-trained models. This research aims to map, identify, and analyze research trends related to extractive summarization in low-resource languages and to formulate future research directions. This study employs a systematic literature review following the PRISMA 2020 protocol. Articles were collected from the ScienceDirect, IEEE Xplore, and Google Scholar databases, covering the 2020–2025 period. A total of nine publications meeting the inclusion criteria were thoroughly analyzed based on six research questions (RQ) formulated using the PICOC framework. Most studies rely on unsupervised approaches such as TextRank, LexRank, and LSA, with key features including word frequency, sentence position, and semantic proximity. News corpora dominate the domain, while system performance evaluation remains limited to traditional metrics such as ROUGE and F1-Score. Identified challenges include limited annotated datasets, the absence of local NLP models, and a lack of meaning-based evaluation approaches. This study confirms that linguistic inequality persists in text summarization, with most research relying on unsupervised methods and lexical evaluation. To address this, three strategic directions are recommended: developing open, diverse language corpora; adopting adaptable lightweight NLP models; and advancing semantic evaluation approaches. Cross-community and interdisciplinary collaboration is essential for building more inclusive and sustainable automatic text summarization systems.
##submission.citations##
Abimanyu, C. G., ER, N., & Karyawati, A. A. I. N. E. (2020). Balinese Automatic Text Summarization Using Genetic Algorithm. JITK (Jurnal Ilmu Pengetahuan Dan Teknologi Komputer), 6(1), 13–20. https://doi.org/10.33480/jitk.v6i1.1344
Alomari, A., Al-shamayleh, A. S., Idris, N., Sabri, A. Q., & Member, S. (2023). Warm-Starting for Improving the Novelty of Abstractive Summarization. IEEE Access, 11(October), 112483–112501. https://doi.org/10.1109/ACCESS.2023.3322226
Amir-Behghadami, M., & Janati, A. (2020). Population, Intervention, Comparison, Outcomes and Study (PICOS) design as a framework to formulate eligibility criteria in systematic reviews. Emergency Medicine Journal, 37(6), 387 LP – 387. https://doi.org/10.1136/emermed-2020-209567
Azam, M., Khalid, S., Almutairi, S., Ali Khattak, H., Namoun, A., Ali, A., & Syed Muhammad Bilal, H. (2025). Current Trends and Advances in Extractive Text Summarization: A Comprehensive Review. IEEE Access, 13, 28150–28166. https://doi.org/10.1109/ACCESS.2025.3538886
Badawi, S. (2023). Kurdsum: A new benchmark dataset for the kurdish text summarization. Natural Language Processing Journal. https://doi.org/10.1016/j.nlp.2023.100043
Bakagianni, J., Pouli, K., Gavriilidou, M., & Pavlopoulos, J. (2025). A systematic survey of natural language processing for the Greek language. Patterns. https://doi.org/https://doi.org/10.1016/j.patter.2025.101313
Bali, K., Choudhury, M., Sitaram, S., & Seshadri, V. (2019, December). ELLORA: Enabling Low Resource Languages with Technology. UNESCO International Conference on Language Technologies for All (LT4All). https://www.microsoft.com/en-us/research/publication/ellora-enabling-low-resource-languages-with-technology/
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., & Stoyanov, V. (2020). Unsupervised Cross-lingual Representation Learning at Scale. In D. Jurafsky, J. Chai, N. Schluter, & J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 8440–8451). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.747
D’Silva, J., & Sharma, U. (2020). Unsupervised Automatic Text Summarization of Konkani Texts using K-means with Elbow Method. International Journal of Engineering Research and Technology, 13(9), 2380–2384. https://doi.org/10.37624/ijert/13.9.2020.2380-2384
Dhawale, A. D., Kulkarni, S. B., & Kumbhakarna, V. M. (2020). Automatic Unsupervised Extractive Summarization of Marathi Text Using Natural Language Processing. IOSR Journal of Computer Engineering (IOSR-JCE), 22(6), 21–25. https://doi.org/10.9790/0661-2206022125
Giri, Virat V, Dr. M.M. Math, D. U. P. K. (2024). Marathi Extractive Text Summarization using Latent Semantic Analysis and Fuzzy Algorithms. Computational Intelligence and Machine Learning.
Helm, P., Bella, G., Koch, G., & Giunchiglia, F. (2024). Diversity and language technology : how language modeling bias causes epistemic injustice. Ethics and Information Technology, 26(1), 1–15. https://doi.org/10.1007/s10676-023-09742-6
Humayoun, M., & Akhtar, N. (2022). CORPURES: Benchmark corpus for urdu extractive summaries and experiments using supervised learning. Intelligent Systems with Applications, 16(August 2021), 200129. https://doi.org/10.1016/j.iswa.2022.200129
Joshi, P., Santy, S., Budhiraja, A., Bali, K., & Choudhury, M. (2020). The State and Fate of Linguistic Diversity and Inclusion in the NLP World. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 6282–6293. https://www.microsoft.com/en-us/research/publication/the-state-and-fate-of-linguistic-diversity-and-inclusion-in-the-nlp-world/
Kondath, M., Suseelan, D. P., & Idicula, S. M. (2022). Extractive summarization of Malayalam documents using latent Dirichlet allocation: An experience. Journal of Intelligent Systems, 31(1), 393–406. https://doi.org/10.1515/jisys-2022-0027
Manuel, J., & Moreno, T. (2014). Automatic text summarization. ISTE & Wiley.
Nee, J., & Smith, G. M. (2022). Linguistic justice as a framework for designing , developing , and managing natural language processing tools. Big Data & Society (SAGE Publications), 9(2). https://doi.org/10.1177/20539517221090930
Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., … Moher, D. (2021). The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ, 372, n71. https://doi.org/10.1136/bmj.n71
Partha Pakray, Alexander Gelbukh, S. B. (2025). Natural language processing applications for low-resource languages. Natural Language Processing, 31, 183–197. https://doi.org/10.1017/nlp.2024.33
Phukan, R., Daimari, M., Kharghoria, A., Basumatary, B., & Science, C. (2025). Natural Language Processing in Low- Resource Languages : Progress and Prospects. International Journal of Advanced Multidisciplinary Application, 2(9), 4–8.
Poupard, D. (2024). Attention is all low-resource languages need. Translation Studies, 17(2), 424–427. https://doi.org/10.1080/14781700.2024.2336000
Prasetya, A., & Kurniawan, F. (2024). A survey of text summarization : Techniques , evaluation and challenges. Natural Language Processing Journal, 7(October 2023), 100070. https://doi.org/10.1016/j.nlp.2024.100070
Raj, M. R., Haroon, R. P., & Sobhana, N. V. (2020). A novel extractive text summarization system with self-organizing map clustering and entity recognition. Sādhanā. https://doi.org/10.1007/s12046-019-1248-0
Ram, A., & Salammagari, R. (2024). ADVANCING NATURAL LANGUAGE UNDERSTANDING FOR LOW-RESOURCE LANGUAGES : CURRENT PROGRESS , APPLICATIONS , AND CHALLENGES. International Journal of Advanced Research in Engineering and Technology (IJARET), 15(3), 244–255.
Team NLLB, Costa-jussa, M., Cross, J., Çelebi, O., Elbayad, M., Heafield, K., Heffernan, K., Kalbassi, E., Licht, D., Maillard, J., Sun, A., Wang, S., Wenzek, G., Youngblood, A., Akula, B., Barrault, L., Gonzalez, G., Hansanti, P., & Wang, J. (2022). No Language Left Behind: Scaling Human-Centered Machine Translation. https://doi.org/10.48550/arXiv.2207.04672
Wirayasa, I. P. M., Wirawan, I. M. A., & Pradnyana, I. M. A. (2019). ALGORITMA BASTAL: ADAPTASI ALGORITMA NAZIEF & ADRIANI UNTUK STEMMING TEKS BAHASA BALI. Jurnal Nasional Pendidikan Teknik Informatika: JANAPATI, 8(1 SE-Articles), 60–69. https://doi.org/10.23887/janapati.v8i1.13500
Yan, X., Wang, Y., Song, W., Zhao, X., Run, A., & Yanxing, Y. (2022). Unsupervised Graph-Based Tibetan Multi-Document Summarization. Computers, Materials and Continua, 73(1), 1769–1781. https://doi.org/10.32604/cmc.2022.027301
##submission.downloads##
Publikované
##submission.howToCite##
Číslo
Sekcia
##submission.license##
##submission.copyrightStatement##
##submission.license.cc.by-sa4.footer##





