Mae256 Regression Models Using Cross Answers

Internal Code :
Subject Code : MAE256
University : Deakin University
Subject Name : Economics

Analytical Methods In Economics And Finance

The coronavirus disease (COVID-19) has spread rapidly around the world following its initial outbreak in the City of Wuhan, the capital of Hubei province of China. By the beginning of 2021, the COVID-19 has affected almost all countries and territories across the world with the global death-toll exceeding two million. Although all of the factors that contributed to the rapid spread of the virus are not precisely known yet, it is believed that socio-economic activities requiring inter-personal interactions, certain long-term health conditions, and lifestyle may have acted behind the unprecedented spread of the disease. To capture the effects of such factors on the number of people infected, as an econometrician, you decide to choose variables representing level of the economic development, population characteristics and the geographical locations of various countries of the world as of 1 February 2021. The dataset [MAE256 T1 2021 Assignment Data] for the assignment is provided by on the MAE256 unit site on CloudDeakin and contains information on the continent of each country (Continent), total number of infected people (Cases), Gross Domestic Product per capita (GDP), population density (POP), percentage of population aged more than 70 years (Pop70), and the prevalence of diabetes (Diabetes). The dataset for this assignment has been obtained from: https://ourworldindata.org/coronavirus-data.

NOTE: You need to use the dataset provided by the Unit Team on CloudDeakin for the assignment. Please include all Excel output tables for summary statistics and regressions, and all figures in your submission.

Variable definitions

Country: The name of each country in the dataset

Continent: The continent of each country in the dataset

Cases: Total number of infected people

GDP: Gross Domestic Product per person (in AUD)

POP: Population density (number of people per square kilometres of land area)

Pop70: Percentage of population who are aged over 70

Diabetes: Percentage of people aged 20-79 who have type 1 or type 2 diabetes

Present the descriptive statistics of the variables Cases and Comment on the means and measures of dispersion (standard deviation, skewness, and kurtosis) of these two variables.

Solution: Let us have a closer look at the descriptive statistics of the variable Cases and GDP.

Cases		GDP

Mean	582932.5057	Mean	23485.84714
Standard Error	175785.8242	Standard Error	1901.699237
Median	65817.5	Median	15075.20898
Mode	1	Mode	#N/A
Standard Deviation	2318774.276	Standard Deviation	25085.13579
Sample Variance	5.37671E+12	Sample Variance	629264037.7
Kurtosis	91.2485313	Kurtosis	4.849364972
Skewness	8.846743578	Skewness	1.929834591
Range	26321119	Range	149069.6923
Minimum	1	Minimum	847.7435897
Maximum	26321120	Maximum	149917.4359
Sum	101430256	Sum	4086537.403
Count	174	Count	174
Largest(1)	26321120	Largest(1)	149917.4359
Smallest(1)	1	Smallest(1)	847.7435897
Confidence Level(95.0%)	346961.0213	Confidence Level(95.0%)	3753.519445

The average number of infected individuals is about 582932 being estimated with a standard error if 17586. The values of skewness and Kurtosis being very much higher than the desired range one can definitely say that the distributions will have high peaks and longer tails. For further analysis we need to work on a transformed data in order to get the reliable results. The basic variable by itself does not satisfy the Gaussian distribution. Hence a transformation will help in reducing the skewness and kurtosis value thereby making the variable satisfy the normal distributions and can be used for other statistical calculations. The range of the data is very large.

The GDP per person has an average value of 23486 being estimated with a standard deviation of 1902. The data exhibits a small amount of skewness and kurtosis. The distribution of the variable can be termed as asymptotically Normal. However a transformation can help in providing better insights for statistical analysis and techniques.

(ii) Estimate the following simple regression model of Cases on GDP:

Cases = b0 + b1GDP + u

Write down the estimated sample regression function and interpret both estimated coefficients.

Solution:

Regression Statistics
Multiple R	0.281878954
R Square	0.079455745
Adjusted R Square	0.073675398
Standard Error	2294367.428
Observations	174

ANOVA
	df	SS	MS	F	Significance F
Regression	1	7.86055E+13	7.86E+13	14.9323	0.000157691
Residual	173	9.10693E+14	5.26E+12
Total	174	9.89299E+14

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%
Intercept	0	#N/A	#N/A	#N/A	#N/A	#N/A
GDP	19.58937467	5.06940753	3.864234	0.000157	9.583523391	29.59523

Model: Cases = b0 + b1GDP + u

Cases =0+19.59GDP+Error

We observe that the linear relationship between the cases and GDP is around 7.9 or approximately 8%. The regression is significant as F(1,173)=14.932 and the p_value =0.0001<0.05. Hence, we say that the regression is significant at 5% level of significance. The model indicates that with every 1 AUD increase the number of infected cases increases by 19.5%

Now estimate the following simple regression model with a log-log specification:

log(Cases) = b0 + b1 log(GDP) + u

Report your regression results in a sample regression function. Interpret the estimated coefficient of log(GDP). Provide an explanation on the sign of the slope coefficient.

Solution:

Regression Statistics
Multiple R	0.970323
R Square	0.941528
Adjusted R Square	0.935747
Standard Error	2.675318
Observations	174

ANOVA
	df	SS	MS	F	Significance F
Regression	1	19937.84	19937.84	2785.656	3.551E-108
Residual	173	1238.217	7.157324
Total	174	21176.06

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%
Intercept	0	#N/A	#N/A	#N/A	#N/A	#N/A
ln(GDP)	1.122555	0.021269	52.77932	1.4E-108	1.080575459	1.164535

This is a regression where both the GDP and Cases have been transformed. The logarithmic transformation has been used. This transformed relationship explains 94% of linear relationship among the variables. The regression is significant with F(1,173)=2786 and p_value<0.05.

Model: log(Cases) = b0 + b1 log(GDP) + u

Model : log(cases)=0+1.126* log(GDP)

We can say that the value of the intercept is zero. While, 1.126 can be termed as form of elasticity which is positive in nature. This implies with every 1% increase in GDP there is an increase of 1.126 percent in the infection cases in the linear form

however, the economic interpretation will be as follows:

With every $I AUD increase in GDP there will be an increase of exp(1.126) = 3.083298606 implying 8.3% increase in the infection cases.

Estimate an extended log-log model that relates the number of cases to the countries’ GDP and population density:

log(Cases) = b0 + b1 log(GDP) + b2 log(POP) + u

Report your results in a sample regression function. Based on your estimates, how would you interpret the effect of POP on the number of cases? What can you conclude when you compare the goodness of fit of this regression model and that of the regression model in part (iii)?

Solution:

This is another kind of log-log relationship.

Regression Statistics
Multiple R	0.970465073
R Square	0.941802457
Adjusted R Square	0.935650146
Standard Error	2.67676773
Observations	174

ANOVA
	df	SS	MS	F	Significance F
Regression	2	19943.67	9971.833	1391.726	1.5682E-106
Residual	172	1232.395	7.165085
Total	174	21176.06

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%
Intercept	0	#N/A	#N/A	#N/A	#N/A	#N/A
ln(GDP)	1.065870695	0.066385	16.05583	4.88E-36	0.934835971	1.196905
ln(POP)	0.126368468	0.140185	0.901444	0.368613	-0.150335123	0.403072

The variables all used are logarithmic in nature. The transformed variables show hig values R². Hence the transformed variables produce a good fit for linear models. We observe that the variables GDP and POP turn out to be significant variables in estimating the cases of infection. The regression is significant at 5% level of significance as F(2,172)=1392 with p_value<0.05.

Model:

log(Cases) = b0 + b1 log(GDP) + b2 log(POP) + u

log(cases) =0+1.066* log(GDP)+ 0.127* log(POP) + u

with every 1 unit increase in POP there will be an increase of exp(0.127)= 1.13542 which implies an increase of 13.5% increase in the infected cases.

In comparison to the previous model(iii) there is not a substantial difference in R²or adj R². Hence in out case there is not significant contribution due to an addition of the variable log(POP). Hence in terms of goodness of fit the previous model is able evaluate almost 94% of linear relationship.

Using the estimated model in (iv), test whether the coefficient of log(GDP) is greater than 1 at 5% level

Solution:

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%
Intercept	0	#N/A	#N/A	#N/A	#N/A	#N/A
ln(GDP)	1.065870695	0.066385	16.05583	4.88E-36	0.934835971	1.196905
ln(POP)	0.126368468	0.140185	0.901444	0.368613	-0.150335123	0.403072

We see the value of log(GDP) =1.06>1. We also observe that the p_value is approximately equal to 0. Since p<0.05 we reject the null hypothesis at 5% of significance and conclude that the coefficient of log(GDP) is definitely greater than 1.

Add the variables Pop70 and Diabetes to the log-log equation in (iii) and estimate the following model:

log(Cases)= b0 + b1 log(GDP) +b2 Pop70 +b3 Diabetes + u

Interpret the coefficient of Pop70. Test whether Pop70 and Diabetes are jointly significant at 5% level of significance.

Solution:

Regression Statistics
Multiple R	0.972645
R Square	0.946037
Adjusted R Square	0.939558
Standard Error	2.585062
Observations	174

ANOVA
	df	SS	MS	F	Significance F
Regression	3	20033.35	6677.782	999.2875	1.0722E-107
Residual	171	1142.715	6.682544
Total	174	21176.06

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%
Intercept	0	#N/A	#N/A	#N/A	#N/A	#N/A
ln(GDP)	1.232289	0.06669	18.47797	1.31E-42	1.100647673	1.363929
Pop70	0.053938	0.054051	0.997908	0.319734	-0.052754826	0.16063
Diabetes	-0.17274	0.054876	-3.14777	0.001942	-0.281057747	-0.06442

The coefficient of Pop70 is 0.054 which is insignificant in the Model. This implies that there is is no significant contribution of the variable Pop70 in terms of producing an increase in the rate of infection.

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%
Intercept	2.474843133	1.910687616	1.295263084	0.196996	-1.29705	6.246732
ln(GDP)	0.910243025	0.252205332	3.609134735	0.000404	0.412364	1.408122
Pop70	0.146710908	0.144840864	1.01291102	0.312551	-0.13922	0.432641
Diabetes	-0.128867729	0.08948488	-1.440106183	0.151687	-0.30552	0.047784
pop70_diabetes	-0.00610965	0.018036742	-0.33873358	0.735231	-0.04172	0.029497

We observe that Pop70 and Diabetes are not jointly significant because the p_value corresponding to the joint variable is 0.73>0.05. Hence they joint impact can be termed as insignificant at 5% level of significance in impacting the increasing rate of covid spread.

Create a dummy variable indicating whether or not a country is in Oceania. Add the variable Oceania to the log-log equation in (iv) and estimate the following

log(Cases)= b0 + b1 log(GDP)+ b2 log(POP)+b3 Oceania + u

Report your regression results in a sample regression function. Interpret the meaning of the coefficient for Oceania.

Solution:

Regression Statistics
Multiple R	0.978114
R Square	0.956706
Adjusted R Square	0.950352
Standard Error	2.31546
Observations	174

ANOVA
	df	SS	MS	F	Significance F
Regression	3	20259.27	6753.09	1259.587	7.9E-116
Residual	171	916.7915	5.361354
Total	174	21176.06

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%
Intercept	0	#N/A	#N/A	#N/A	#N/A	#N/A
ln(GDP)	1.146681	0.058383	19.64081	1.07E-45	1.031438	1.261925
ln(POP)	0.012706	0.122164	0.10401	0.917283	-0.22844	0.25385
Oceania	-6.46518	0.84265	-7.67244	1.23E-12	-8.12852	-4.80184

This linear regression model is good linear fit with 95% of linear relationship being explained. The variables GDP and Oceania are significant as the p_values<0.05. Hence these 2 variables have their contribution in predicting the infection rate

The coefficient of Oceania is -6.47 indicating that the elasticity is negative. Hence with every 1 individual added from oceania there is a decrease in the rate of infection by exp(-6.47)= 0.0015 which means an increase in rate of infection by 0.1% occurs.

Using the model estimated in (vii), test whether the model is overall statistically significant at 1% level.

Solution:

Regression Statistics
Multiple R	0.62787987
R Square	0.394233131
Adjusted R Square	0.383543128
Standard Error	2.30696728
Observations	174

ANOVA
	df	SS	MS	F	Significance F
Regression	3	588.8157	196.2719	36.87867	2.0737E-18
Residual	170	904.7567	5.322098
Total	173	1493.572

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%	Lower 99.0%	Upper 99.0%
Intercept	2.210725098	1.470129	1.503763	0.134498	-0.6913336	5.112784	-1.61905	6.040496
ln(GDP)	0.945645462	0.145795	6.486126	9.23E-10	0.65784349	1.233447	0.565841	1.32545
ln(POP)	-0.050070733	0.128676	-0.38912	0.697673	-0.3040798	0.203938	-0.38528	0.285138
Oceania	-6.635032667	0.847123	-7.83243	4.94E-13	-8.3072683	-4.9628	-8.84184	-4.42823

This linear regression model is good linear fit with 95% of linear relationship being explained. The variables GDP and Oceania are significant as the p_values<0.01. Hence these 2 variables have their contribution in predicting the infection rate

The coefficient of Oceania is -6.64 indicating that the elasticity is negative. Hence with every 1 individual added from oceania there is a decrease in the rate of infection by exp(-6.64)= 0.001307 which means an increase in rate of infection by 0.1% occurs

Create a dummy variable indicating whether or not a country is in Europe. Add the variable Europe to the log-log equation in (iv) and estimate the following

log(Cases)= b0 + b1 log(GDP)+ b2 log(POP)+b3 Europe+ u

Test whether Europe has a significant effect at the 1% level of significance. What do you infer about the explanatory power of the model in part (ix) compared to the model that you estimated in part (vii)?

Solution:

Regression Statistics
Multiple R	0.971197
R Square	0.943225
Adjusted R Square	0.936713
Standard Error	2.65158
Observations	174

ANOVA
	df	SS	MS	F	Significance F
Regression	3	19973.78	6657.927	946.9552	8.1E-106
Residual	171	1202.28	7.030879
Total	174	21176.06

	Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%	Lower 99.0%	Upper 99.0%
Intercept	0	#N/A	#N/A	#N/A	#N/A	#N/A	#N/A	#N/A
ln(GDP)	1.02528	0.068623	14.94074	7.79E-33	0.889822	1.160737	0.846524	1.204035
ln(POP)	0.156855	0.139645	1.123242	0.262909	-0.11879	0.432504	-0.2069	0.520613
Europe	1.036708	0.500926	2.069582	0.039995	0.047913	2.025502	-0.26815	2.341562

The Europe variable is insignificant as the p_value =0.0399>0.01. Hence at 1% level of significance we can conclude that this particular variable has no contributing in terms of increasing or decreasing the infection rate.

In comparision to previous model if an indicidual is from Oceania there is an impact on the rate of infection. However, that is not the case if the individual is from Europe. Hence an individual being from Europe produces no impact on the rate of infection designed in this linear model.

Remember, at the center of any academic work, lies clarity and evidence. Should you need further assistance, do look up to our Economics Assignment Help

Mae256 Regression Models Using Cross Answers

Analytical Methods In Economics And Finance

Book Online Sessions for Mae256 Regression Models Using Cross Answers Online

Submit Your Assignment Here