OMP5434 (Fall 2019) Big Data Computing

Big Data Computing代写
Big Data Computing代写

Big Data Computing代写 There are 2 questions in this programming assignment. You should write a MapReduce program to solve each of them.

Individual Assignment 2 Due Date: 10:00am, 2nd December, 2019

Please submit your assignment in Blackboard
and follow our requirements in Section 2.

1. Problem statement

A sample input file is given below. Each line corresponds to a point-of-interest (POI), which contains a keyword, coordinate values x and y (separated by white space).

park 3 5
lake 2 3
mall 1 4
park 2 4
lake 9 8
mall 2 7

We measure the distance between two points p1=(x1,y1) and p2=(x2,y2) by:

_________________

dist(p1, p2) = Ö(x1 – x2)2 + (y1 – y2)2

Each keyword k is associated with a group G(k) of points.

[Example] The group of “park” contains two points: (3,5) and (2,4).

There are 2 questions in this programming assignment.
You should write a MapReduce program to solve each of them.

Question Q1: Find the centroid (i.e., the mean position of points) of each group.

[Example]

Input: the sample input above Big Data Computing代写

Output:

lake  5.5  5.5

mall  1.5  5.5

park  2.5  4.5

Question Q2: Find the diameter (i.e., the maximum distance between any two points inside a group) of each group.

[Example]

Input: the sample input above Big Data Computing代写

Output:

lake  8.602

mall  3.162

park  1.414

2. Requirements Big Data Computing代写

  1. Though MapReduce support multiple languages, in this assignment, you should use Java (Java 8) for implementation.
  2. You submission should be organized as follows

<YourStudentID> // your folder name, [Example] 19001234g

— Q1.java              // source file for question 1

— Q1.jar                // jar file for question 1, compiled and archived from Q1.java

— Q2.java              // source file for question 2

— Q2.jar                // jar file for question 2, compiled and archived from Q2.java

3.Archive the above structure as <YourStudentID>.zip and submit this .zip file in blackboard. [Example] 19001234g.zip

4.Make sure that you can compile your source file and run with the latest Hadoop version’s (i.e., Hadoop 3.2.1) pseudo-distributed mode.

5.Your jar file should be directly runnable on Linux platform with the following call: Big Data Computing代写

bin/hadoop jar Q1.jar Q1 <input path> <output path>

bin/hadoop jar Q2.jar Q2 <input path> <output path>

6.Your output result should preserve double precision.

7.You should only use one MapReduce round to solve each sub-question.

8.[Hint] You may use the Ubuntu image we provided for this assignment.

  • Google drive: Big Data Computing代写

https://drive.google.com/file/d/1lMqmTAj2sC2gVqkVWW-MDUR24vv-a3Si/view?usp=sharing

  • The Y drive in COMP Lab:  Y:\Subject\COMP5434
           Note: These files will get expired on November 7!

3. Grading criteria

20 marks will be given if your program can be compiled.

  • for each .java file, 10 marks

80 marks will be given if your program is correct. We will test the correctness of your program by using 8 test cases (4 for each sub-question).

  • For each test case, 10 marks

Notice this is an individual assignment. Plagiarism will result in 0 mark!

Big-Data-Computing代写
Big-Data-Computing代写

更多其他:文学论文代写 商科论文代写 艺术论文代写 人文代写 Case study代写 心理学论文代写 哲学论文代写 计算机论文代写

合作平台:天才代写 幽灵代写  写手招聘