Comparison of the Accuracy of Stratified Random Sampling and Simple Random Sampling Methods in National Assessment (AN)

Autori

  • Januar Pribadi Pusat Asesmen Pendidikan, Kemendikbudristek
  • Achmad Ridwan Universitas Negeri Jakarta
  • Awaluddin Tjalla Universitas Negeri Jakarta

##semicolon##

https://doi.org/10.59188/eduvest.v5i6.51460

##semicolon##

standard error##common.commaListSeparator## mean square error##common.commaListSeparator## national assessment##common.commaListSeparator## simple random sampling, stratified random sampling

Abstrakt

Sampling methods are crucial for large-scale assessments. International surveys like PISA, TIMSS, and PIRLS use stratified random sampling (StRS) to enhance estimation accuracy, ensure representation of all subpopulations, and provide efficient administration. Similarly, Indonesia's National Assessment (AN) applies StRS, dividing populations by school size, class size, and gender. However, the accuracy of the AN sampling method, including its reliability and validity, has not been tested since its 2021 implementation. This study compares the reliability and validity of the AN sampling method to simple random sampling (SRS). Reliability is assessed by the consistency of estimates across repeated sampling, indicated by small standard error (SE) and confidence intervals (CI). Validity measures how accurately sample estimates reflect population parameters, evaluated through Mean Square Error (MSE). Using AN data from 1.9 million junior high school students out of 4.2 million, the analysis shows no significant differences in national population parameters between StRS and SRS. Both methods produce similar mean estimates (55) and standard deviations (10.7). However, StRS demonstrates greater variability in weights, reflecting its ability to account for sampling structure. At the school level, StRS outperforms SRS, yielding narrower CI and MSE ranges, highlighting its superior reliability. While MSE differences are statistically significant, their practical impact is minor due to the small effect size and large dataset. These results suggest StRS is more reliable for school-level reporting.

##submission.citations##

Abrahamowicz, M., Binder, H., Briel, M., Hornung, R., Morris, T. P., Rahnenführer, J., Sauerbrei, W., Groenwold, R. H. H., & Boulesteix, A.-L. (2020). Introduction to statistical simulations in health research. BMJ Open, 10(12), e039921. https://doi.org/10.1136/bmjopen-2020-039921

Almaskut, A., LaRoche, S., & Foy, P. (2023). Chapter 3: Sample design in PIRLS 2021. In M. v. Davier, I. V. Mullis, B. Fishbein, & P. Foy (Eds.), Methods and procedures: PIRLS 2021 technical report. Boston College, TIMSS & PIRLS International Study Center. https://doi.org/10.6017/lse.tpisc.tr2103.kb9560

Altman, D. G., & Bland, J. M. (2014a). Uncertainty and sampling error. BMJ, g7064. https://doi.org/10.1136/bmj.g7064

Altman, D. G., & Bland, J. M. (2014b). Uncertainty beyond sampling error. BMJ, g7065. https://doi.org/10.1136/bmj.g7065

Berndt, A. E. (2020). Sampling methods. Journal of Human Lactation, 36(1), 1–3. https://doi.org/10.1177/0890334420906850

Creswell, J. W., & Creswell, J. D. (2018). Research design: Qualitative, quantitative, and mixed methods approaches (5th ed.). Sage.

Ding, C.-S., Haieh, C.-T., Wu, Q., & Pedram, M. (1996). Stratified random sampling for power estimation. In Proceedings of International Conference on Computer Aided Design (pp. 576–582). IEEE. https://doi.org/10.1109/ICCAD.1996.569913

Gignac, G. E., & Szodorai, E. T. (2016). Effect size guidelines for individual differences researchers. Personality and Individual Differences, 102, 74–78. https://doi.org/10.1016/j.paid.2016.06.069

Hodson, T. O. (2022). Root-mean-square error (RMSE) or mean absolute error (MAE): When to use them or not. Geoscientific Model Development, 15, 5481–5487. https://doi.org/10.5194/gmd-15-5481-2022

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning with applications in R (2nd ed.). Springer.

Kepala-BSKAP. (2024). Keputusan BSKAP Kemendikbudristek No. 019/H/KP/2024 tentang Pedoman Penyelenggaraan AN. Kementerian Pendidikan, Kebudayaan, Riset, dan Teknologi.

LaRoche, S., & Foy, P. (2020). Chapter 9: Sample implementation in TIMSS 2019. In M. O. Martin, M. v. Davier, & I. V. Mullis (Eds.), Methods and procedures: TIMSS 2019 technical report. TIMSS & PIRLS International Study Center, Boston College.

Levy, P. S., & Lemeshow, S. (2013). Sampling of populations: Methods and applications (4th ed.). John Wiley & Sons.

Lin, L. (2018). Bias caused by sampling error in meta-analysis with small sample sizes. PLOS ONE, 13(9), e0204056. https://doi.org/10.1371/journal.pone.0204056

Lohr, S. L. (2022). Sampling: Design and analysis (3rd ed.). CRC Press.

Machromah, I. U., Utami, N. S., Setyaningsih, R., Mardhiyana, D., Wahyu, L., & Fatmawati, S. (2021). Minimum competency assessment: Designing tasks to support students’ numeracy. Turkish Journal of Computer and Mathematics Education, 12(14), 5480–5487.

Mang, J., Küchenhoff, H., Meinck, S., & Prenzel, M. (2021). Sampling weights in multilevel modelling: An investigation using PISA sampling structures. Large-scale Assessments in Education, 9(1), 1–39. https://doi.org/10.1186/s40536-021-00099-0

Mascha, E. J., & Vetter, T. R. (2018). Significance, errors, power, and sample size: The blocking and tackling of statistics. Anesthesia & Analgesia, 126(2), 691–698. https://doi.org/10.1213/ANE.0000000000002741

Megawati, L. A., & Sutarto, H. (2021). Analysis numeracy literacy skills in terms of standardized math problem on a minimum competency assessment. Unnes Journal of Mathematics Education, 10(2), 128–135. https://doi.org/10.15294/ujme.v10i2.49540

Mendikbudristek. (2021). Peraturan Menteri Pendidikan, Kebudayaan, Riset, dan Teknologi No. 71 Tahun 2021 tentang Asesmen Nasional. Kemendikbudristek.

Mendikbudristek. (2022). Permendikbudristek No. 9 Tahun 2022 tentang Evaluasi Sistem Pendidikan oleh Pemerintah Pusat dan Pemerintah Daerah terhadap PAUD, Dikdas, Dikmen. Kemendikbudristek.

OECD. (2023). PISA 2022 results (Volume I): The state of learning and equity in education. OECD Publishing. https://doi.org/10.1787/53f23881-en

OECD. (2024). PISA 2022 technical report. OECD Publishing. https://www.oecd.org/en/publications/pisa-2022-technical-report_01820d6d-en.html

Pusmendik. (2024a, November 12). Simulasi AKM. https://pusmendik.kemdikbud.go.id/an/simulasi_akm

Pusmendik. (2024b). Laporan Monitoring AN 2024. Pusat Asesmen Pendidikan.

Pusmendik. (2025, January 8). FAQ Asesmen Nasional. https://pusatinformasi.raporpendidikan.kemdikbud.go.id/hc/en-us/articles/38597276705305

Salkind, N. J. (2006). Encyclopedia of measurement and statistics. SAGE Publications.

Taherdoost, H. (2016). Sampling methods in research methodology: How to choose a sampling technique for research. International Journal of Academic Research in Management, 5(2), 18–27.

Widarti, H. R., Rokhim, D. A., Septiani, M. O., & Dzikrulloh, M. H. A. (2022). Identification of science teacher practices and barriers in preparation of minimum competency assessment in the Covid-19 pandemic era. Orbital: The Electronic Journal of Chemistry, 14(1), 47–56. https://doi.org/10.17807/orbital.v14i1.1695

Wibowo, A., Indahwati, Sumertajaya, I. M., & Astuti, E. T. (2015). Accuracy comparison of simple, systematic, and stratified random sampling for estimating population (Minimarket case in Indonesia). In Proceedings of International Conference on Research, Implementation and Education of Mathematics and Sciences (pp. 168–175). Yogyakarta State University.

Willmott, C. J., & Matsuura, K. (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 30, 79–82.

Wu, C., & Thompson, M. E. (2020). Sampling theory and practice. Springer. https://doi.org/10.1007/978-3-030-44246-0

##submission.downloads##

Publikované

2025-06-24