## Economic and Statistical Software: Introduction to R

### Note: 经济与统计软件代写

(i) Clearly state your name and student ID.

(iii) There are 100 marks in total.

(iv) You can answer the questions in English or Chinese unless otherwise stated.

1. (10 marks) Explain why endogeneity is rarely considered in the machine learning forecasting exercises. Describe your understanding of endogeneity fifirst, then use one machine learning algorithm that you are familiar with as an example.
1. (20 marks) Along with this fifinal assignment, you should also fifind two PDF fifiles. These are academic articles published in the journal of economic perspective by well-known scholars. The two papers are:

(a) “Mismeasured Variables in Econometric Analysis: Problems from the Right and Problems from the Left” by Jerry Hausman (2001)

(b) “Avoiding Invalid Instruments and oping with Weak Instruments” by Michael P. Murray(2006)

Please write a short report in English on (i) paper (a), if the last digit of your student ID is an odd number; or (ii) paper (b), if the last digit of your student ID is an even number. DO NOT write reports on both papers! Your report should be at least 300 words long that summarizes the article. You should discuss the fifindings, contribution, and the conclusion of the paper. You can add equations and technical terms if necessary. You can cite other references, but keep in mind that the references shall not be counted as part of the 300 words. Any form of plagiarism will not be tolerated.

1. #### (20 marks) This question is about the VIX data set vixlarge.csv that contains the VIX data and the associated dates.

(a) (5 marks) Plot the VIX data against date in line. Clearly label the horizontal and vertical axises. 经济与统计软件代写

(b) (10 marks) Let the dependent variable y be the VIX and the fifirst and the second columns of the independent variable X be the intercept term and the lag of VIX (set x0 = 0). Conduct a one-step-ahead rolling window exercise.

i.Set the window length at 3000 and make forecast on the next period yt+1.

ii.Start from the beginning and roll until the end.

iii. For each roll, we make forecast using ridge and lasso methods with tuning parameter λ = 1, 10 for each method. In total, we compare 4 methods.

iv.Comparing the forecasts with the actual true values of yt+1. Compute the mean squared forecast errors and the mean absolute forecast errors for the four methods and report them in a table.

v.Which method has the best performance and which one has the worst? Provide your understanding and explanation of the results.

vi.Come up with an algorithm that can beat the best performing method stated in question v. Clearly describe your motivation, the details of the algorithm, and the results.

#### (c) (5 marks) We now consider a more general forecasting exercise with model 经济与统计软件代写

yt+h = f(xt) + ut+h, for t = 1, …, n h

where h is the forecasting horizon. Note that Q1(b) is the special case with h = 1 and f(·) being the ridge or LASSO estimator. We now replicate 1(b) with h = [1, 5, 10, 22] using LASSO and the regression tree. Choose your own tuning parameters this time, state them clearly, and report your forecasting results in a table. What do you observe?

6.2 (30 marks, 5 marks each) This question requires you to use the movielarge.csv data. As usual, the OpenBox variable on the fifirst column is the response and all others are predictors. This larger data set now contains 18 predictors.

(a) Apply regression tree, bagging tree, and random forest to fifit the data using all the predictors. Clearly state the approach you adopt (which method, prune or not, if prune, how, what is your number of bootstraps B, etc.). Report the respective centered R2 s.

(b) Brieflfly describe how we measure predictor importance using OOB error. Do not copy from the lecture notes, use your own language.

(c) Use OOB error to measure predictor importance by bagging tree. Clearly label the top 5 predictors.

(d) Treat the fifirst 90 observations as training set, the rest 4 observations as evaluation set. Compare the prediction performance of regression tree, bagging tree, and random forest by MSFE. Describe your results in details.

(e) Repeat (d), this time, set the fifirst 80 obs. as training set, the rest as evaluation set.