2014年11月17日 星期一

Air Pollution Case Study in R

http://www.epa.gov/ttn/airs/airsaqs/detaildata/downloadaqsdata.htmimage

SNAGHTML1820776

 

R script is sourced from here:

https://github.com/DataScienceSpecialization/courses/blob/master/04_ExploratoryAnalysis/CaseStudy/script.R

 

Read file into R without header

image

 

Read the first line and convert it to become header

image

 

Make it as header name of data frame

make.names() can make the name more meaningful

image

image

 

summary(): see Min. Median, Mean, and Max values

mean(is.na()): to know how many percentage of NA value in the data, the example below is around 11%

image

 

Repeat above steps to read Y2013 data

image

image

 

boxplot(x0, x1), hard to see the result

image

image

The mean of Y2013 is lower then Y1999

image

 

Look at negative value

image

 

Convert date formate from int –> character –> date

image

 

image

image

site0 has 33, site1 has 21, both after intersect() has only 11

image

Make a new field, county.site. Then subset data that appear in both years

image

To see how many observation in county.site, for example, 63.2008 has 122 rows

image

Choose county 63 and side ID 2008

image

Plot x0sub data with wrong date format in x

image

image

Convert date format and plot again

image

image

Plot x1sub data with correct date format in x

image

image

Plot 2 subset in same panel, but the y ranges are different

image

image

 

Find global range and plot them again

image

image

image

 

Show state-wide means and make a plot showing trend

image

Convert mn0 to data frame, then merge mn0 and mn1

image

image

Connect lines between 2 points

image

image

沒有留言:

張貼留言