Interested in a PLAGIARISM-FREE paper based on these particular instructions?...with 100% confidentiality?

Order Now

Undergraduate Econometrics: Problem Set 4 Due: October 9th at 12:00 pm 0. Book Problems 3.4, 3.12, 3.13 (i,ii), 4.5, 4.10, 4.11 1. Playing with OVB In this le we learn about omitted variable bias through a simulation exercise. The le ovbSimulation.R which is attached with this assignment contains code that simulates the impact of running a single variable regression when U and X are correlated. In particular, it beings with the following regression equation: Y = 0 + 1X + U and allows for XU > 0. The code itself generates an estimate of ^ in 100 simulated datasets and stores the results in a data frame called betaData. This data frame contains two variables: BETAHAT and BIAS DUMMY. The rst variable is an estimated ^ from some sample and the second variable is a 0/1 variable for whether this variable is drawn from a biased or unbiased sample. a) Show that E(UjX) = 0 implies that E(UX) = 0. Then, explain why this means that OLS does not work if XU 6= 0. Hint: Use the law of total expectations on E(UXjX). b) At the bottom of the code is some space labeled Student Analysis. Here, ll in code to cal- culate the mean of the ^ estimates in each of the biased and unbiased samples. Also ll in the code to plot the overlapping densities. Both of these are commands from previous assignments. Hint: Your code should look something like: remember that to index a subset of data we can do betaData$BETAHAT[betaData$BIAS DUMMY==?] What goes in \?” c) At the top of the code is a variable which governs the correlation between X and U. First, set = 0 and run the code. Report the estimated means in the biased and unbiased samples. The true value of = 2. Perform a two-sided t-test for each sample on whether or not ^ is statistically dierent from 2. As a reminder, to do a t-test, rst calculate the mean and the standard deviation, then construct the t-statistic. You should report two separate t-statistics for this exercise (one for each of the 0/1 groups). d) Now set = :55. Plot the distribution of ^ by group. Then repeat (c) for the biased group. Is ^ now statistically signicantly dierent from 2? 1 e) Repeat (d) for = :01. Relative to the sampling variation in ^ , does the bias seem too important here? (No rigorous answer). f) Increase the sample size to N = 500 and rerun the code for = :5. What happens to the variance of ^ in both when samples are biased and not? What does this exercise suggest about the usefulness of having a large sample size when your estimator is biased? ??g) Optional problem for those interested in exploring computation. Increase the number of sam- ples to 500 (this is the S variable in the code). Fix the number of observations to 100 and let = :25. In this case, calculate the fraction of the time that you would estimate ^ to be statistically signicantly indistinguishable from 2 despite the bias. Hint: You will rst need to calculate the upper and lower bound on ^ at which you would not reject the null hypothesis with = 2 and the 2 ^ as calculated in the sample. 2. Do Doctors Aect Drinking? In this problem we will exploit the drinkData.Rdata dataset. This data is taken from \The Eect Of Physician Advice On Alcohol Consumption: Count Regres- sion With An Endogenous Treatment Eect, “by Donald S. Kenkel and Joseph V. Terza (Journal of Applied Econometrics, 16: 165-184 (2001)). The goal of this paper was to understand if doctors could impact people’s drinking activity. The authors do some sophisticated work to try and deal with concerns about causality. We will not replicate their methods. Instead, we will ignore issues related to omitted variable bias and focus on the tools of multiple regressions. There is a complete description of the dataset at the back of this problem set. a) First let us get a feel for the dataset. The variable DRINKS is the number of drinks an individual has had and the variable ADVISE is a 0/1 variable for whether a person’s doctor has told them to drink less. Report the mean number of drinks per person in each group. Similarly, calculate the mean education and income by group. Finally, what fraction of individuals in each group are between 30 and 40 and how many are between 40 and 50. Do a t-test for a dierence in group means of income and education. Do they appear to be dierent? b) What is a possible source of omitted variable bias in a regression of DRINKS on ADVISE? You may think about the variables above or something else. Remember: omitted variable bias has two ingredients. c) Regress DRINKS on ADVISE and do a one-sided signicance test on ADVISE. What is the sign 2 of ADVISE? Why do you think this might be the case? d) Now run the same regression but including income, education and all the age dummies as controls. Report the results (by hand). What happens to the coecient on advise? e) Do an F-test to determine if age does not matter for drinking habits. f) Create a variable called EVERDRINK that is a 0/1 variable for DRINKS being positive. Regress this on all age variables. Do an F-test to determine if the choice of whether to drink at all depends on age. 3 Variable Description DRINKS Total drinks over a two week period ADVISE Dummy variable for whether the individual has been told to drink less by a doctor. EDITINC Monthly income ($1000) AGE30 30 <age 40 AGE40 40 <age 50 AGE50 50 <age 60 AGE60 60 <age 70 AGEGT70 70 < age EDUC Years of schooling BLACK Black OTHER Non-white, non-black MARRIED Married WIDOW Widowed DIVSEP Divorced or separated EMPLOYED Employed UNEMPLOY Unemployed NORTHE Northeast MIDWEST Midwest SOUTH South MEDICARE Insurance through Medicare MEDICAID Insurance through Medicaid CHAMPUS Military insurance HLTHINS Health insurance REGMED Reg. source of care DRI See same doctor MAIORLIM Limits on major daily activ. SOMELIM Limits on some daily activ. HVDIAB Have diabetes HHRTCOND Have heart condition HADSTROKE Had stroke 4

Undergraduate Econometrics: Problem Set 4
Due: October 9th at 12:00 pm
0. Book Problems 3.4, 3.12, 3.13 (i,ii), 4.5, 4.10, 4.11
1. Playing with OVB In this le we learn about omitted variable bias through a simulation
exercise. The le ovbSimulation.R which is attached with this assignment contains code that
simulates the impact of running a single variable regression when U and X are correlated. In
particular, it beings with the following regression equation:
Y = 0 + 1X + U
and allows for XU > 0. The code itself generates an estimate of ^ in 100 simulated datasets and
stores the results in a data frame called betaData. This data frame contains two variables: BETAHAT
and BIAS DUMMY. The rst variable is an estimated ^ from some sample and the second variable is
a 0/1 variable for whether this variable is drawn from a biased or unbiased sample.
a) Show that E(UjX) = 0 implies that E(UX) = 0. Then, explain why this means that OLS does
not work if XU 6= 0. Hint: Use the law of total expectations on E(UXjX).
b) At the bottom of the code is some space labeled Student Analysis. Here, ll in code to cal-
culate the mean of the ^ estimates in each of the biased and unbiased samples. Also ll in the
code to plot the overlapping densities. Both of these are commands from previous assignments.
Hint: Your code should look something like: remember that to index a subset of data we can do
betaData$BETAHAT[betaData$BIAS DUMMY==?] What goes in \?”
c) At the top of the code is a variable which governs the correlation between X and U. First,
set = 0 and run the code. Report the estimated means in the biased and unbiased samples. The
true value of = 2. Perform a two-sided t-test for each sample on whether or not ^ is statistically
dierent from 2. As a reminder, to do a t-test, rst calculate the mean and the standard deviation,
then construct the t-statistic. You should report two separate t-statistics for this exercise (one for
each of the 0/1 groups).
d) Now set = :55. Plot the distribution of ^ by group. Then repeat (c) for the biased group. Is
^ now statistically signicantly dierent from 2?
1
e) Repeat (d) for = :01. Relative to the sampling variation in ^ , does the bias seem too important
here? (No rigorous answer).
f) Increase the sample size to N = 500 and rerun the code for = :5. What happens to the
variance of ^ in both when samples are biased and not? What does this exercise suggest about the
usefulness of having a large sample size when your estimator is biased?
??g) Optional problem for those interested in exploring computation. Increase the number of sam-
ples to 500 (this is the S variable in the code). Fix the number of observations to 100 and let
= :25. In this case, calculate the fraction of the time that you would estimate ^ to be statistically
signicantly indistinguishable from 2 despite the bias. Hint: You will rst need to calculate the
upper and lower bound on ^ at which you would not reject the null hypothesis with = 2 and the
2
^
as calculated in the sample.
2. Do Doctors Aect Drinking? In this problem we will exploit the drinkData.Rdata dataset.
This data is taken from \The Eect Of Physician Advice On Alcohol Consumption: Count Regres-
sion With An Endogenous Treatment Eect, “by Donald S. Kenkel and Joseph V. Terza (Journal
of Applied Econometrics, 16: 165-184 (2001)). The goal of this paper was to understand if doctors
could impact people’s drinking activity. The authors do some sophisticated work to try and deal
with concerns about causality. We will not replicate their methods. Instead, we will ignore issues
related to omitted variable bias and focus on the tools of multiple regressions. There is a complete
description of the dataset at the back of this problem set.
a) First let us get a feel for the dataset. The variable DRINKS is the number of drinks an
individual has had and the variable ADVISE is a 0/1 variable for whether a person’s doctor has
told them to drink less. Report the mean number of drinks per person in each group. Similarly,
calculate the mean education and income by group. Finally, what fraction of individuals in each
group are between 30 and 40 and how many are between 40 and 50. Do a t-test for a dierence in
group means of income and education. Do they appear to be dierent?
b) What is a possible source of omitted variable bias in a regression of DRINKS on ADVISE?
You may think about the variables above or something else. Remember: omitted variable bias has
two ingredients.
c) Regress DRINKS on ADVISE and do a one-sided signicance test on ADVISE. What is the sign
2
of ADVISE? Why do you think this might be the case?
d) Now run the same regression but including income, education and all the age dummies as
controls. Report the results (by hand). What happens to the coecient on advise?
e) Do an F-test to determine if age does not matter for drinking habits.
f) Create a variable called EVERDRINK that is a 0/1 variable for DRINKS being positive.
Regress this on all age variables. Do an F-test to determine if the choice of whether to drink at all
depends on age.
3
Variable Description
DRINKS Total drinks over a two week period
ADVISE Dummy variable for whether the individual has been told to drink less by a doctor.
EDITINC Monthly income ($1000)
AGE30 30 <age 40
AGE40 40 <age 50
AGE50 50 <age 60
AGE60 60 <age 70
AGEGT70 70 < age
EDUC Years of schooling
BLACK Black
OTHER Non-white, non-black
MARRIED Married
WIDOW Widowed
DIVSEP Divorced or separated
EMPLOYED Employed
UNEMPLOY Unemployed
NORTHE Northeast
MIDWEST Midwest
SOUTH South
MEDICARE Insurance through Medicare
MEDICAID Insurance through Medicaid
CHAMPUS Military insurance
HLTHINS Health insurance
REGMED Reg. source of care
DRI See same doctor
MAIORLIM Limits on major daily activ.
SOMELIM Limits on some daily activ.
HVDIAB Have diabetes
HHRTCOND Have heart condition
HADSTROKE Had stroke
4

Interested in a PLAGIARISM-FREE paper based on these particular instructions?...with 100% confidentiality?

Order Now