How to cite:
Sulastri, Eri Zuliarso, Arief Jananto (2022). Prediction of the Development
of Covid-19 Case In Indonesia Based on Google Trend Analysis. Journal
Eduvest. Vol 2(7): 1.407-1.425
E-ISSN:
2775-3727
Published by:
https://greenpublisher.id/
Eduvest Journal of Universal Studies
Volume 2 Number 7, July, 2022
p- ISSN 2775-3735- e-ISSN 2775-3727
PREDICTION OF THE DEVELOPMENT OF COVID-19 CASE
IN INDONESIA BASED ON GOOGLE TREND ANALYSIS
Sulastri
1
, Eri Zuliarso
2
,
Arief Jananto
3
Universitas Stikubank (Unisbank) Semarang, Indonesia
ajananto09@edu.unisbank.ac.id
ABSTRACT
The global outbreak of the coronavirus disease (COVID-19)
has recently hit many countries around the world. Indonesia
is one of the 10 most affected countries. Search engines such
as Google provide data on search activity in a population, and
this data may be useful for analyzing epidemics. Leveraging
data mining methods on electronic resource data can provide
better insights into the COVID-19 outbreak to manage health
crises in every country and around the world. This study aims
to predict the incidence of COVID-19 by utilizing data from
the Covid 19 Task Force and the Google Trends website.
Linear regression and long-term memory (LSTM) models were
used to estimate the number of positive COVID-19 cases.
KEYWORDS
Covid-19, Long ShortTerm Memori, Google Trend
This work is licensed under a Creative Commons
Attribution-ShareAlike 4.0 International
INTRODUCTION
Long holidays often encourage people to travel, even though movement and
crowds can have an impact on increasing Covid-19 cases (Wen, Kozak, Yang, & Liu,
2020). According to data from the Covid-19 Handling Task Force, there is always an
upward trend of positive cases occurring every holiday period (Sharpe Jr, Kuszyk, &
Mossa-Basha, 2021). Google Trends is a website owned by Google.Inc that contains
trends in the use of keywords on the Google search engine website and trending news
(Jun, Yoo, & Choi, 2018). One of the benefits of Google Trends is for research. RNN has
been used for sequential time series applications with temporal dependencies.
RNN which has the ability to process the current data by using the previous data.
Meanwhile, the RNN is problematic to train long-term dependency data, which is solved
Sulastri, Eri Zuliarso, Arief Jananto
Prediction of the Development of Covid-19 Case In Indonesia Based on Google Trend
Analysis 1.408
by one of the RNN variants. The LSTM was anticipated by Hochreiter and Schmidhuber,
has been used as an advanced version of the RNN network and has overcome the
limitations of RNN by using a hidden layer unit known as a memory cell. The memory
cells are self-connected which store the temporal state of the network and are controlled
through three named gates: input gate, output gate and forget gate (Gers & Schmidhuber,
2001).
The work of input and output gates is used to control the flow of input and output
of memory cells throughout the network (Sak, Senior, & Beaufays, 2014). In addition, a
forget gate has been added to the memory cell, which passes high-weighted output
information from the previous neuron to the next neuron. Information residing in memory
depends on high activation yield; if the input unit has high activation, the information is
stored in the memory cell. In addition, if the output unit has a high activation, the
information will be passed on to the next neuron (Shahid, Zameer, & Muneeb, 2020).
Otherwise, the high-weighted input information resides in the memory cell.
This study analyzes the development of Covid-19 cases associated with several
keywords on Google Trends. In this study, several algorithms were tested to analyze the
development of Covid 19 cases associated with keywords in Google Trends (Pan,
Nguyen, Abu-Gellban, & Zhang, 2020).
RESEARCH METHOD
In the first stage of this research, we will explore the data in Google Trends. The
keywords used are 'covid 19', 'ppkm', 'lockdown', 'ptm', 'wfh', 'vaccination', 'cluster',
'coronavirus', 'psbb', 'delta variant'. With a period starting from 2020-01-01 to 2021-11-
10. The study began by downloading data on the development of daily spread on the
COVID-19 website. Data Repository by the Center for Systems Science and Engineering
(CSSE) at Johns Hopkins University (https://github.com/CSSEGISandData/COVID-19)
and from https://data.humdata.org/dataset/indonesia-covid-19-cases-recoveries-and-
deaths-per-province
The data in Google Trends is a random sample of Google search data. This data
is anonymized (identity not disclosed), classified (search query topics defined), and
aggregated (grouped together). Google Trends data can be filtered in two ways: real time
and non-real time. Real time refers to a random sample of searches from the previous
seven days, while non-real time refers to a random sample of the entire Google dataset,
which can range from 2004 to 36 hours ago (Pretorius, Kruger, & Bezuidenhout, 2022).
Google Trends are two separate random samples, so the graph will show one or the other,
but not both at the same time.
RESULT AND DISCUSSION
1. Analysis of Community Activities on Daily Cases of Confirmed Covid 19 and
their Visualization.
This step begins with preparing the data to be used, namely daily case variables
with global mobility, visualizing daily cases with each global mobility variable and
analyzing their correlation.
Eduvest Journal of Universal Studies
Volume 2 Number 7, July 2022
1.409 http://eduvest.greenvest.co.id
Figure 1.11 Preparing data for analysis of community activities and daily confirmed cases
of covid 19
Figure 1. 12 Coding for visualization of daily cases and community activities
Figure 1.13 Visualization between Retail_and_recreation and daily cases
Figure 1.13 shows that if daily cases decrease then retail and recreation increases,
otherwise if daily cases increase then retail and recreation decreases.
Figure 1.14 Visualization between grocery and pharmacy and daily cases
Sulastri, Eri Zuliarso, Arief Jananto
Prediction of the Development of Covid-19 Case In Indonesia Based on Google Trend
Analysis 1.410
Figure 1.14 shows that grocery and pharmacy during the pandemic is consistently
high, this shows that the public's need for medicines is quite high during the pandemic.
Figure 1.15 Visualization between parks and daily cases
Figure 1.15 shows that if daily cases decrease then activities in parks increase, on
the contrary if daily cases increase then community activities in parks decrease.
Figure 1.16 Visualization between transit_station and daily cases
Figure 1.16 shows that if daily cases decrease then activity at the station
(transit_station) increases, otherwise if daily cases increase then activity at the station
(transit_station) decreases.
Figure 1.17 Visualization between workplace and daily cases
Figure 1.17 shows that in the early days of the pandemic work activities were
quite high, but during high daily cases it can be seen that work activities fell drastically
due to the lockdown.
Eduvest Journal of Universal Studies
Volume 2 Number 7, July 2022
1.411 http://eduvest.greenvest.co.id
Figure 1.18 Visualization between residential and daily cases
Figure 1.18 shows that activity in housing is high at the beginning of the
pandemic, daily cases are low, but at high daily cases activity in housing decreases but
does not decrease at all.
Figure 1.19 Pearson Correlation Analysis of daily cases and community activities
Figure 1.19 shows the results of correlation analysis on daily cases and
community activities using Pearson Correlation. From the picture, it can be seen that the
negative correlations are retail and recreation, parks, transit station, workplaces retail and
pharmacy meaning that if daily cases increase, these four activities will decrease and vice
versa. While the correlation between daily cases and grocery and pharmacy, residential is
positive even though the value is small.
Sulastri, Eri Zuliarso, Arief Jananto
Prediction of the Development of Covid-19 Case In Indonesia Based on Google Trend
Analysis 1.412
Figure 1.20 Spearman Correlation Analysis of daily cases and community
activities
Figure 1.20 shows a correlation analysis using Spearman correlation, from the
figure it shows that the positive correlations are retail_and_recreation,
grocery_and_pharmacy, parks, transit_station. While the value is worksplaces and
residential.
2. Analysis of Community Activities on the Total Confirmed Cases of Covid 19 and
its Visualization.
This step begins with preparing the data to be used, namely the Total Case
variable with global mobility, visualizing daily cases with each global mobility variable
and analyzing the correlation.
Figure 1.21 Preparation of data for analysis of total cases and community activities
Figure 1.21 shows the coding of data preparation for visualization between the total cases
and community activities.
Eduvest Journal of Universal Studies
Volume 2 Number 7, July 2022
1.413 http://eduvest.greenvest.co.id
Figure 1.22 Visualization between Retail and recreation and total cases
Figure 1.22 shows a visualization between retail and recreation and total cases,
where when the total cases are high, retail and recreation decreases.
Figure 1.23 Visualization between grocery_and_pharmacy and total cases
Figure 1.23 shows that grocery_and_pharmacy activities are quite stable,
meaning that drug buying activities are quite stable, the increase occurs when the total
number of cases is high.
Figure 1.24 Visualization between parks and total cases
Figure 1.24 shows that community activities in parks at the beginning of the
pandemic were quite high, but when total cases were high, parks activities decreased.
Sulastri, Eri Zuliarso, Arief Jananto
Prediction of the Development of Covid-19 Case In Indonesia Based on Google Trend
Analysis 1.414
Figure 1.25 Visualization between transit_station and daily cases
Figure 1.25 shows that transit_station activity is high if daily cases are low. When
the daily case goes up, transit_station goes down.
Figure 1.26 Visualization between residential and daily cases
Figure 1.26 shows that daily cases are low, so residential is high. However, when
daily cases are high, it can be seen that residential drops drastically.
Eduvest Journal of Universal Studies
Volume 2 Number 7, July 2022
1.415 http://eduvest.greenvest.co.id
Figure 1.27 Pearson correlation analysis between daily cases and community
activities
Figure 1.27 shows the correlation value between daily cases and community
mobility using the Pearson correlation formula. It can be seen that the highest correlation
is 0.748, namely the correlation between daily cases and grocery_and_pharmacy,
meaning that the correlation is quite high and positive.
Figure 1. 28 Spearman Correlation Analysis between total cases and community
activities
Figure 1.28 shows the correlation analysis using Spearman, from the results
obtained the highest correlation value is 0.784, namely the correlation between
grocery_and_pharmacy and total cases. This means that the correlation is high and
positive.
3. Analysis of Community Activities on the Total Confirmed Deaths of Covid 19
and its Visualization.
This step begins with preparing the data to be used, namely the Total Died
variable with global mobility, visualizing daily cases with each global mobility variable
and analyzing the correlation.
Sulastri, Eri Zuliarso, Arief Jananto
Prediction of the Development of Covid-19 Case In Indonesia Based on Google Trend
Analysis 1.416
Figure 1.29 Data preparation for analysis of total deaths by community activities
Figure 1.30 Visualization between Retail and recreation and Total death
Figure 1.30 shows a visualization between retail and recreation and total deaths,
where when total deaths are low, retai and recreation is high, on the other hand, total
deaths are high, retail and recreation is low.
Figure 1.31 Visualization between grocery_and_pharmacy and Total died
Figure 1.31 shows a visualization between grocery_and pharmacy and total
deaths, where when the total deaths are low, grocery_and_pharmacy is high, on the other
hand, the total deaths are high, so grocery and pharmacy is low.
Figure 5.32 Visualization between Parks and Total died
Eduvest Journal of Universal Studies
Volume 2 Number 7, July 2022
1.417 http://eduvest.greenvest.co.id
Figure 1.32 shows a visualization between Parks and the total death toll, where
when the total death toll is low, Parks is high, on the other hand, when the total death toll
is high, Parks is low.
Figure 1.33 Visualization between transit_station and Total died
Figure 1.33 shows a visualization between transit_station and total deaths, where
when the total death toll is low, the transit_station is high, otherwise the total death is
high, the transit_station is low.
Figure 1.34 Visualization between workplaces and Total dies
Figure 1.34 shows a visualization between the workplaces and the total number
of deaths, where when the total number of deaths is low, the workplaces are high,
otherwise the total number of deaths is high, the workplaces are low.
Figure 1.35 Visualization between Residential and Total Dies
Figure 1.35 shows a visualization between residential and total deaths, where
when the total death toll is low, residential is high, on the other hand, the total death toll
is high, the residential is low.
Sulastri, Eri Zuliarso, Arief Jananto
Prediction of the Development of Covid-19 Case In Indonesia Based on Google Trend
Analysis 1.418
Figure 1.36 Pearson Correlation Analysis between total deaths and community
activities
Figure 5.36 shows the Pearson correlation value between total deaths and
community mobility using the Pearson correlation formula. It can be seen that the highest
correlation is 0.733, which is the correlation between total deaths and grocery and
pharmacy, meaning that the correlation is quite high and positive (Prawoto, Priyo
Purnomo, & Az Zahra, 2020).
Figure 1.37 Spearman Correlation Analysis between total deaths and community
activities
Figure 1.37 shows the correlation value between total deaths and community
mobility using the Spearman correlation formula. It can be seen that the highest
correlation is 0.784, which is the correlation between total deaths and grocery and
pharmacy, meaning that the correlation is quite high and positive.
4. Prediction of Number of New Cases Per Day using Long Short Term Memory
(LSTM)
At the initial stage, determine the dataset_train, namely the number of
new_cases_per_day and as the dataset_test is the number of new_cases_per_days, the
code is as follows:
Figure 1.38 Setting training data and testing data for prediction
Eduvest Journal of Universal Studies
Volume 2 Number 7, July 2022
1.419 http://eduvest.greenvest.co.id
Then import the packages needed for prediction.
Figure 1.39 Importing packages needed for prediction
The next process is to build the model with epoch.
Sulastri, Eri Zuliarso, Arief Jananto
Prediction of the Development of Covid-19 Case In Indonesia Based on Google Trend
Analysis 1.420
Figure 1.40 The process of building a predictive model
Prediction results are visualized as follows:
Figure 1.41 Predicted number of cases per day
After making predictions, then testing the models that have been obtained and
making visualizations.
Figure 1. 42 Prediction Graph
From Figure 5.42 it can be seen that between the validation loss values (red) and
train loss (blue), the graphs are close together. This shows that the predictions made are
quite accurate and are also shown in Figure 5.43 with a value of RMSE = 145,135 which
is quite small.
Figure 1.43 Evaluation Results using RMSE
5. Predicting the Number of Cumulative Cases using Long Short Term Memory
(LSTM)
At the initial stage, determine the dataset_train, namely the number of new cases
per day and as the dataset test is the Cumulative Number of Cases, the coding is as
follows:
Eduvest Journal of Universal Studies
Volume 2 Number 7, July 2022
1.421 http://eduvest.greenvest.co.id
Figure 1.43 Setting training data and testing data for prediction
Then import the packages needed for prediction.
Figure 1. 44 Setting up the necessary packages for modeling
The next step is to build a model using epochs and then visualize it.
Figure 1.45 Model building process
Sulastri, Eri Zuliarso, Arief Jananto
Prediction of the Development of Covid-19 Case In Indonesia Based on Google Trend
Analysis 1.422
Figure 1. 46 Visualization of the built prediction model
Figure 1.47 Prediction graph
From Figure 1.47 it can be seen that between the validation loss values (red) and
the train loss (blue), the graphs are close together. This shows that the predictions made
are quite accurate and are also shown in Figure 1.48 with a sufficient value of RMSE =
449516,694.
Figure 5.48 Evaluation Results using RMSE
6. Prediction of the Number of Cumulative Death Cases using Long Short Term
Memory (LSTM)
At the initial stage, the dataset_train is determined, namely the cumulative
number of death cases and as the dataset_test is the Cumulative Number of Cases Death
Cumulative, the coding is as follows:
Eduvest Journal of Universal Studies
Volume 2 Number 7, July 2022
1.423 http://eduvest.greenvest.co.id
Figure 1.49 Setting training data and testing data for prediction
Figure 1.50 Importing packages needed for prediction
The next step is to build a model using epochs and then visualize it
Figure 1.51 The process of building a predictive model
Sulastri, Eri Zuliarso, Arief Jananto
Prediction of the Development of Covid-19 Case In Indonesia Based on Google Trend
Analysis 1.424
Figure 1. 52 Visualization of the built prediction model
Figure 1.53 Prediction graph
From Figure 1.53, it can be seen that between the validation loss values (red) and
the train loss (blue), the graphs are close together. This shows that the predictions made
are quite accurate and are also shown in Figure 5.54 with a value of RMSE = 14331,656
which is quite small.
Figure 1.54 Evaluation Results using RMSE
CONCLUSION
Based on the results of research that has been carried out on positive confirmed
COVID-19 data downloaded from Google Trend from January 1, 2020 to November 10,
2021 with 617 records including daily case variables, total cases, total deaths with global
mobility variables (community activities) including retail and recreation, grocery and
pharmacy, parks, transit stations, workplaces, a model has been obtained to predict the
number of cases per day, predict the number of cumulative cases, and the number of
cumulative deaths.
Eduvest Journal of Universal Studies
Volume 2 Number 7, July 2022
1.425 http://eduvest.greenvest.co.id
The best prediction result is the prediction of the number of cases per day with an
RMSE = 145,135. Meanwhile, the highest correlation analysis is 0.784 between the total
death variable and grocery and pharmacy.
REFERENCES
Ahmetolan, Semra, Bilge, Ayse Humeyra, Demirci, Ali, Peker-Dobie, Ayse, & Ergonul,
Onder. (2020). What can we estimate from fatality and infectious case data using
the susceptible-infected-removed (SIR) model? A case study of Covid-19 pandemic.
Frontiers in Medicine, 7, 556366.
Devaraj, Jayanthi, Elavarasan, Rajvikram Madurai, Pugazhendhi, Rishi, Shafiullah, G.
M., Ganesan, Sumathi, Jeysree, Ajay Kaarthic, Khan, Irfan Ahmad, & Hossain,
Eklas. (2021). Forecasting of COVID-19 cases using deep learning models: Is it
reliable and practically significant? Results in Physics, 21, 103817.
Gers, Felix A., & Schmidhuber, E. (2001). LSTM recurrent networks learn simple
context-free and context-sensitive languages. IEEE Transactions on Neural
Networks, 12(6), 13331340.
Jun, Seung Pyo, Yoo, Hyoung Sun, & Choi, San. (2018). Ten years of research change
using Google Trends: From the perspective of big data utilizations and applications.
Technological Forecasting and Social Change, 130, 6987.
Kondo, Kenjiro, Ishikawa, Akihiko, & Kimura, Masashi. (2019). Sequence to sequence
with attention for influenza prevalence prediction using google trends. Proceedings
of the 2019 3rd International Conference on Computational Biology and
Bioinformatics, 17.
Maaliw, Renato R., Mabunga, Zoren P., & Villa, Frederick T. (2021). Time-Series
Forecasting of COVID-19 Cases Using Stacked Long Short-Term Memory
Networks. 2021 International Conference on Innovation and Intelligence for
Informatics, Computing, and Technologies (3ICT), 435441. IEEE.
Pan, Zhenhe, Nguyen, Hoang Long, Abu-Gellban, Hashim, & Zhang, Yuanlin. (2020).
Google trends analysis of covid-19 pandemic. 2020 IEEE International Conference
on Big Data (Big Data), 34383446. IEEE.
Prawoto, Nano, Priyo Purnomo, Eko, & Az Zahra, Abitassha. (2020). The impacts of
Covid-19 pandemic on socio-economic mobility in Indonesia.
Pretorius, A., Kruger, E., & Bezuidenhout, S. (2022). Google trends and water
conservation awareness: the internet’s contribution in South Africa. South African
Geographical Journal, 104(1), 5369.
Sak, Hasim, Senior, Andrew W., & Beaufays, Françoise. (2014). Long short-term
memory recurrent neural network architectures for large scale acoustic modeling.
Shahid, Farah, Zameer, Aneela, & Muneeb, Muhammad. (2020). Predictions for COVID-
19 with deep learning models of LSTM, GRU and Bi-LSTM. Chaos, Solitons &
Fractals, 140, 110212.
Sharpe Jr, Richard E., Kuszyk, Brian S., & Mossa-Basha, Mahmud. (2021). Special
report of the RSNA COVID-19 Task Force: the short-and long-term financial
impact of the COVID-19 pandemic on private radiology practices. Radiology.
Wen, Jun, Kozak, Metin, Yang, Shaohua, & Liu, Fang. (2020). COVID-19: potential
effects on Chinese citizens’ lifestyle and travel. Tourism Review, 76(1), 7487.
Zhang, Kefei, Thé, Jesse, Xie, Guangyuan, & Yu, Hesheng. (2020). Multi-step ahead
forecasting of regional air quality using spatial-temporal deep neural networks: a
case study of Huaihai Economic Zone. Journal of Cleaner Production, 277, 123231.