## statistical model

Your task in this assessment is to use the data from the first 5 401 records, to build a statistical model that will help you to:

• Understand the social, demographic and economic factors associated with variation between MSOAs in numbers of Covid deaths during the period March– July 2020; and
• Estimate the numbers of deaths for each of the 1 800 records where you don’t have this information.

### Detailed instructions 统计模型代写

1. Read the data into your chosen software package. And carry out any necessary recoding (e.g. to deal with the fact that -1 represents a missing value).
1. Carry out an exploratory analysis that will help you to start building a sensible statistical model to understand. And predict the numbers of Covid deaths in each MSOA. This analysis should aim to identify an appropriate set of candidate variables to take into the subsequent modelling exercise. As well as to identify any important features of the data that may have some implications for the modelling.You will need to consider the context of the problem to guide your choice of exploratory analysis. See the ‘Hints’ below for some ideas.
1. Using your exploratory analysis as a starting point, develop a statistical model that enables you to predict the number of Covid deaths for each MSOA based on (a subset of) the other variables in the dataset. And also to understand the variation in deaths between different MSOAs. To be convincing, you will need to consider a range of models and to use an appropriate suite of diagnostics to assess them. Ultimately however, you are required to recommend a single model that is suitable for interpretation, and to justify your recommendation. Your chosen model should be either a linear model, a generalized linear model or a generalized additive model.
1. Use your chosen model to predict the number of Covid deaths for each MSOA where this information is missing. And also to estimate the standard deviation of your prediction errors.

#### You are required to submit three files, as follows: 统计模型代写

• A report on your analysis, not exceeding 2 500 words of text plus two pages of graphs and / or tables. The word count includes titles, footnotes, appendices, references etc. — in fact it includes everything except the two pages of graphs / tables and, if present, the separate page describing the contribution of each pair member (see below). Your report should be in three sections, as follows:

Section I: Describe briefly what aspects of the problem context you considered at the outset, how you used these to start your exploratory analysis. And what were the important points to emerge from this exploratory analysis.

Section II: Describe briefly (without too many technical details) what models you considered in step (3) abov. And why you chose the model that you did.

Section III: State your final model clearly, summarise what your model tells you about the factors associated with variation of death counts in each MSOA, and discuss any potential limitations of the model. 统计模型代写

#### Your report should not include any computer code. It should include some graphs and / or tables, but only those that support your main points. Graphs and tables must appear on separate pages, or they will be included in the word count.

• An R script or SAS program corresponding to your analysis and predictions. Your script/program should run without user intervention on any computer with R or SAS installed, providing the file UKCovidWave1.csv is present in the current working directory / current folder. When run, it should produce any results that are mentioned in your report, together with the predictions and the associated standard deviations.

You may not create any additional input files that can be referenced by your script; nor should you write any code that requires access to the internet in order to run it. If you use R however, you may use the following additional libraries if you wish (together with other libraries that are loaded automatically by these): mgcv, ggplot2, grDevices, RColorbrewer, lattice and MASS. You may not use any other add-on libraries: for present purposes, an “add-on library” is one that requires a library() or require() command. Or equivalent (e.g. the package::command syntax) before it can be used, if your R system is installed using default settings.

• A text file containing your predictions for the 1 800 observations with missing counts. 统计模型代写

The file should contain three columns, separated by spaces and with no header. The first column should be the record identifier (corresponding to variable ID in file UKCovidWave1.csv); the second should be the corresponding count prediction, and the third should be the standard deviation of your prediction error.