The world’s Largest Sharp Brain Virtual Experts Marketplace Just a click Away
Levels Tought:
Elementary,Middle School,High School,College,University,PHD
| Teaching Since: | Apr 2017 |
| Last Sign in: | 103 Weeks Ago, 3 Days Ago |
| Questions Answered: | 4870 |
| Tutorials Posted: | 4863 |
MBA IT, Mater in Science and Technology
Devry
Jul-1996 - Jul-2000
Professor
Devry University
Mar-2010 - Oct-2016
1-
The source code for your Mapper,reducer,anddriverclass.Pleasename
your classes as follows:
MostFrequentDestMapper.java,
MostFrequentDestReducer.java, and
MostFrequentDestDriver.java
2-
Submit the first part of your output on EMR (part-0000)
Â
Â
Assignment 1-MapReduce Warm-up (15 points)DescriptionFor this assignment, you are given a large dataset of flight arrival anddeparture details for all commercial flights within the USA betweenthe years 1987-2000. The original dataset is about 5.5 GB and isextracted from here:http://stat-computing.org/dataexpo/2009/. Thegoal is to write a simple MapReduce programs which returns, for eachunique carrier, the most frequent flight destination for each majorairport.What is the format of the dataset and how can you access it?The dataset is a folder consisting of 9 comma-separated files whereeach file contains flight information for a particular year. For example2000.csv contains all flight information for year 2000. I have uploadedthe entire dataset on my s3 bucket, you can access it via the followingpath:s3://class-data-set/flight-data(Note: you will not be able to download the dataset from the abovepath, you just provide this path as input when running your program onEMR)Each file in the data folder consists of 29 columns. The descriptions ofthe columns are as follows:NameDescription1Year1987-2008
Attachments: