Mae256 Regression Models Using Cross Answers


  • Internal Code :
  • Subject Code : MAE256
  • University : Deakin University
  • Subject Name : Economics

Analytical Methods In Economics And Finance

The coronavirus disease (COVID-19) has spread rapidly around the world following its initial outbreak in the City of Wuhan, the capital of Hubei province of China. By the beginning of 2021, the COVID-19 has affected almost all countries and territories across the world with the global death-toll exceeding two million. Although all of the factors that contributed to the rapid spread of the virus are not precisely known yet, it is believed that socio-economic activities requiring inter-personal interactions, certain long-term health conditions, and lifestyle may have acted behind the unprecedented spread of the disease. To capture the effects of such factors on the number of people infected, as an econometrician, you decide to choose variables representing level of the economic development, population characteristics and the geographical locations of various countries of the world as of 1 February 2021. The dataset [MAE256 T1 2021 Assignment Data] for the assignment is provided by on the MAE256 unit site on CloudDeakin and contains information on the continent of each country (Continent), total number of infected people (Cases), Gross Domestic Product per capita (GDP), population density (POP), percentage of population aged more than 70 years (Pop70), and the prevalence of diabetes (Diabetes). The dataset for this assignment has been obtained from: https://ourworldindata.org/coronavirus-data.

NOTE: You need to use the dataset provided by the Unit Team on CloudDeakin for the assignment. Please include all Excel output tables for summary statistics and regressions, and all figures in your submission.

Variable definitions

Country: The name of each country in the dataset

Continent: The continent of each country in the dataset

Cases: Total number of infected people

GDP: Gross Domestic Product per person (in AUD)

POP: Population density (number of people per square kilometres of land area)

Pop70: Percentage of population who are aged over 70

Diabetes: Percentage of people aged 20-79 who have type 1 or type 2 diabetes

  • Present the descriptive statistics of the variables Cases and Comment on the means and measures of dispersion (standard deviation, skewness, and kurtosis) of these two variables.

Solution: Let us have a closer look at the descriptive statistics of the variable Cases and GDP.

Cases

 

GDP

 
       

Mean

582932.5057

Mean

23485.84714

Standard Error

175785.8242

Standard Error

1901.699237

Median

65817.5

Median

15075.20898

Mode

1

Mode

#N/A

Standard Deviation

2318774.276

Standard Deviation

25085.13579

Sample Variance

5.37671E+12

Sample Variance

629264037.7

Kurtosis

91.2485313

Kurtosis

4.849364972

Skewness

8.846743578

Skewness

1.929834591

Range

26321119

Range

149069.6923

Minimum

1

Minimum

847.7435897

Maximum

26321120

Maximum

149917.4359

Sum

101430256

Sum

4086537.403

Count

174

Count

174

Largest(1)

26321120

Largest(1)

149917.4359

Smallest(1)

1

Smallest(1)

847.7435897

Confidence Level(95.0%)

346961.0213

Confidence Level(95.0%)

3753.519445

The average number of infected individuals is about 582932 being estimated with a standard error if 17586. The values of skewness and Kurtosis being very much higher than the desired range one can definitely say that the distributions will have high peaks and longer tails. For further analysis we need to work on a transformed data in order to get the reliable results. The basic variable by itself does not satisfy the Gaussian distribution. Hence a transformation will help in reducing the skewness and kurtosis value thereby making the variable satisfy the normal distributions and can be used for other statistical calculations. The range of the data is very large.

The GDP per person has an average value of 23486 being estimated with a standard deviation of 1902. The data exhibits a small amount of skewness and kurtosis. The distribution of the variable can be termed as asymptotically Normal. However a transformation can help in providing better insights for statistical analysis and techniques.

(ii) Estimate the following simple regression model of Cases on GDP:

Cases = b0 + b1GDP + u

Write down the estimated sample regression function and interpret both estimated coefficients.

Solution:

Regression Statistics

 

Multiple R

0.281878954

R Square

0.079455745

Adjusted R Square

0.073675398

Standard Error

2294367.428

Observations

174

 

ANOVA

         
 

df

SS

MS

F

Significance F

Regression

1

7.86055E+13

7.86E+13

14.9323

0.000157691

Residual

173

9.10693E+14

5.26E+12

   

Total

174

9.89299E+14

     

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

0

#N/A

#N/A

#N/A

#N/A

#N/A

GDP

19.58937467

5.06940753

3.864234

0.000157

9.583523391

29.59523

Model: Cases = b0 + b1GDP + u

Cases =0+19.59GDP+Error

We observe that the linear relationship between the cases and GDP is around 7.9 or approximately 8%. The regression is significant as F(1,173)=14.932 and the p_value =0.0001<0.05. Hence, we say that the regression is significant at 5% level of significance. The model indicates that with every 1 AUD increase the number of infected cases increases by 19.5%

  • Now estimate the following simple regression model with a log-log specification:

log(Cases) = b0 + b1 log(GDP) + u

Report your regression results in a sample regression function. Interpret the estimated coefficient of log(GDP). Provide an explanation on the sign of the slope coefficient.

Solution:

Regression Statistics

 

Multiple R

0.970323

R Square

0.941528

Adjusted R Square

0.935747

Standard Error

2.675318

Observations

174

 

ANOVA

         
 

df

SS

MS

F

Significance F

Regression

1

19937.84

19937.84

2785.656

3.551E-108

Residual

173

1238.217

7.157324

   

Total

174

21176.06

     

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

0

#N/A

#N/A

#N/A

#N/A

#N/A

ln(GDP)

1.122555

0.021269

52.77932

1.4E-108

1.080575459

1.164535

This is a regression where both the GDP and Cases have been transformed. The logarithmic transformation has been used. This transformed relationship explains 94% of linear relationship among the variables. The regression is significant with F(1,173)=2786 and p_value<0.05.

Model: log(Cases) = b0 + b1 log(GDP) + u

Model : log(cases)=0+1.126* log(GDP)

We can say that the value of the intercept is zero. While, 1.126 can be termed as form of elasticity which is positive in nature. This implies with every 1% increase in GDP there is an increase of 1.126 percent in the infection cases in the linear form

however, the economic interpretation will be as follows:

With every $I AUD increase in GDP there will be an increase of exp(1.126) = 3.083298606 implying 8.3% increase in the infection cases.

  • Estimate an extended log-log model that relates the number of cases to the countries’ GDP and population density:

log(Cases) = b0 + b1 log(GDP) + b2 log(POP) + u

Report your results in a sample regression function. Based on your estimates, how would you interpret the effect of POP on the number of cases? What can you conclude when you compare the goodness of fit of this regression model and that of the regression model in part (iii)?

Solution:

This is another kind of log-log relationship.

Regression Statistics

Multiple R

0.970465073

R Square

0.941802457

Adjusted R Square

0.935650146

Standard Error

2.67676773

Observations

174

 

ANOVA

         
 

df

SS

MS

F

Significance F

Regression

2

19943.67

9971.833

1391.726

1.5682E-106

Residual

172

1232.395

7.165085

   

Total

174

21176.06

     

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

0

#N/A

#N/A

#N/A

#N/A

#N/A

ln(GDP)

1.065870695

0.066385

16.05583

4.88E-36

0.934835971

1.196905

ln(POP)

0.126368468

0.140185

0.901444

0.368613

-0.150335123

0.403072

The variables all used are logarithmic in nature. The transformed variables show hig values R2. Hence the transformed variables produce a good fit for linear models. We observe that the variables GDP and POP turn out to be significant variables in estimating the cases of infection. The regression is significant at 5% level of significance as F(2,172)=1392 with p_value<0.05.

Model:

log(Cases) = b0 + b1 log(GDP) + b2 log(POP) + u

log(cases) =0+1.066* log(GDP)+ 0.127* log(POP) + u

with every 1 unit increase in POP there will be an increase of exp(0.127)= 1.13542 which implies an increase of 13.5% increase in the infected cases.

In comparison to the previous model(iii) there is not a substantial difference in R2 or adj R2 . Hence in out case there is not significant contribution due to an addition of the variable log(POP). Hence in terms of goodness of fit the previous model is able evaluate almost 94% of linear relationship.

  • Using the estimated model in (iv), test whether the coefficient of log(GDP) is greater than 1 at 5% level

Solution:

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

0

#N/A

#N/A

#N/A

#N/A

#N/A

ln(GDP)

1.065870695

0.066385

16.05583

4.88E-36

0.934835971

1.196905

ln(POP)

0.126368468

0.140185

0.901444

0.368613

-0.150335123

0.403072

We see the value of log(GDP) =1.06>1. We also observe that the p_value is approximately equal to 0. Since p<0.05 we reject the null hypothesis at 5% of significance and conclude that the coefficient of log(GDP) is definitely greater than 1.

  • Add the variables Pop70 and Diabetes to the log-log equation in (iii) and estimate the following model:

log(Cases)= b0 + b1 log(GDP) +b2 Pop70 +b3 Diabetes + u

Interpret the coefficient of Pop70. Test whether Pop70 and Diabetes are jointly significant at 5% level of significance.

Solution:

Regression Statistics

 

Multiple R

0.972645

R Square

0.946037

Adjusted R Square

0.939558

Standard Error

2.585062

Observations

174

 

ANOVA

         
 

df

SS

MS

F

Significance F

Regression

3

20033.35

6677.782

999.2875

1.0722E-107

Residual

171

1142.715

6.682544

   

Total

174

21176.06

     

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

0

#N/A

#N/A

#N/A

#N/A

#N/A

ln(GDP)

1.232289

0.06669

18.47797

1.31E-42

1.100647673

1.363929

Pop70

0.053938

0.054051

0.997908

0.319734

-0.052754826

0.16063

Diabetes

-0.17274

0.054876

-3.14777

0.001942

-0.281057747

-0.06442

The coefficient of Pop70 is 0.054 which is insignificant in the Model. This implies that there is is no significant contribution of the variable Pop70 in terms of producing an increase in the rate of infection.

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

2.474843133

1.910687616

1.295263084

0.196996

-1.29705

6.246732

ln(GDP)

0.910243025

0.252205332

3.609134735

0.000404

0.412364

1.408122

Pop70

0.146710908

0.144840864

1.01291102

0.312551

-0.13922

0.432641

Diabetes

-0.128867729

0.08948488

-1.440106183

0.151687

-0.30552

0.047784

pop70_diabetes

-0.00610965

0.018036742

-0.33873358

0.735231

-0.04172

0.029497

We observe that Pop70 and Diabetes are not jointly significant because the p_value corresponding to the joint variable is 0.73>0.05. Hence they joint impact can be termed as insignificant at 5% level of significance in impacting the increasing rate of covid spread.

  • Create a dummy variable indicating whether or not a country is in Oceania. Add the variable Oceania to the log-log equation in (iv) and estimate the following

log(Cases)= b0 + b1 log(GDP)+ b2 log(POP)+b3 Oceania + u

Report your regression results in a sample regression function. Interpret the meaning of the coefficient for Oceania.

Solution:

Regression Statistics

 

Multiple R

0.978114

R Square

0.956706

Adjusted R Square

0.950352

Standard Error

2.31546

Observations

174

 

ANOVA

         
 

df

SS

MS

F

Significance F

Regression

3

20259.27

6753.09

1259.587

7.9E-116

Residual

171

916.7915

5.361354

   

Total

174

21176.06

     

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

0

#N/A

#N/A

#N/A

#N/A

#N/A

ln(GDP)

1.146681

0.058383

19.64081

1.07E-45

1.031438

1.261925

ln(POP)

0.012706

0.122164

0.10401

0.917283

-0.22844

0.25385

Oceania

-6.46518

0.84265

-7.67244

1.23E-12

-8.12852

-4.80184

This linear regression model is good linear fit with 95% of linear relationship being explained. The variables GDP and Oceania are significant as the p_values<0.05. Hence these 2 variables have their contribution in predicting the infection rate

The coefficient of Oceania is -6.47 indicating that the elasticity is negative. Hence with every 1 individual added from oceania there is a decrease in the rate of infection by exp(-6.47)= 0.0015 which means an increase in rate of infection by 0.1% occurs.

  • Using the model estimated in (vii), test whether the model is overall statistically significant at 1% level.

Solution:

Regression Statistics

 

Multiple R

0.62787987

R Square

0.394233131

Adjusted R Square

0.383543128

Standard Error

2.30696728

Observations

174

 

ANOVA

         
 

df

SS

MS

F

Significance F

Regression

3

588.8157

196.2719

36.87867

2.0737E-18

Residual

170

904.7567

5.322098

   

Total

173

1493.572

     

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Lower 99.0%

Upper 99.0%

Intercept

2.210725098

1.470129

1.503763

0.134498

-0.6913336

5.112784

-1.61905

6.040496

ln(GDP)

0.945645462

0.145795

6.486126

9.23E-10

0.65784349

1.233447

0.565841

1.32545

ln(POP)

-0.050070733

0.128676

-0.38912

0.697673

-0.3040798

0.203938

-0.38528

0.285138

Oceania

-6.635032667

0.847123

-7.83243

4.94E-13

-8.3072683

-4.9628

-8.84184

-4.42823

This linear regression model is good linear fit with 95% of linear relationship being explained. The variables GDP and Oceania are significant as the p_values<0.01. Hence these 2 variables have their contribution in predicting the infection rate

The coefficient of Oceania is -6.64 indicating that the elasticity is negative. Hence with every 1 individual added from oceania there is a decrease in the rate of infection by exp(-6.64)= 0.001307 which means an increase in rate of infection by 0.1% occurs

  • Create a dummy variable indicating whether or not a country is in Europe. Add the variable Europe to the log-log equation in (iv) and estimate the following

log(Cases)= b0 + b1 log(GDP)+ b2 log(POP)+b3 Europe+ u

Test whether Europe has a significant effect at the 1% level of significance. What do you infer about the explanatory power of the model in part (ix) compared to the model that you estimated in part (vii)?

Solution:

Regression Statistics

 

Multiple R

0.971197

R Square

0.943225

Adjusted R Square

0.936713

Standard Error

2.65158

Observations

174

   

 

ANOVA

         
 

df

SS

MS

F

Significance F

Regression

3

19973.78

6657.927

946.9552

8.1E-106

Residual

171

1202.28

7.030879

   

Total

174

21176.06

     

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Lower 99.0%

Upper 99.0%

Intercept

0

#N/A

#N/A

#N/A

#N/A

#N/A

#N/A

#N/A

ln(GDP)

1.02528

0.068623

14.94074

7.79E-33

0.889822

1.160737

0.846524

1.204035

ln(POP)

0.156855

0.139645

1.123242

0.262909

-0.11879

0.432504

-0.2069

0.520613

Europe

1.036708

0.500926

2.069582

0.039995

0.047913

2.025502

-0.26815

2.341562

The Europe variable is insignificant as the p_value =0.0399>0.01. Hence at 1% level of significance we can conclude that this particular variable has no contributing in terms of increasing or decreasing the infection rate.

In comparision to previous model if an indicidual is from Oceania there is an impact on the rate of infection. However, that is not the case if the individual is from Europe. Hence an individual being from Europe produces no impact on the rate of infection designed in this linear model.

Remember, at the center of any academic work, lies clarity and evidence. Should you need further assistance, do look up to our Economics Assignment Help


Book Online Sessions for Mae256 Regression Models Using Cross Answers Online

Submit Your Assignment Here