Course project for 'Getting and Cleaning Data'
logic for run_analysis.R
-
First, set the working directory to "C:\DataScience\R-Programming\CleanData\data\UCI HAR Dataset"
-
a. Read features.txt to features table. b. Extract V2 and assign to char array of all 561 variables , the second column of the features
-
Read activity labels from 'activity_labels.txt' activity lables such as walking, sitting...etc.
-
a. Read ./train/X_train.txt to X_train table b. Assign columns names of X_train <-features which are in feats from the 2nd step
-
Read ./train/y_train.txt to Y_train table, activities of each subject.
-
a. Read ./train/subject_train.txt to subject_train table. b. Set "PersonId" as column name for subject_train
-
Next, merge Y_train and activity labels and map all activity codes to descriptive names for all subject's activities(Y_train)
-
a. The step 7 is merged to data.frame - 'train_activities' b. set the column of above data,frame to "Activity"
-
Do column bind for subject_train, train_activities and X_train) ----> resulting finalTrainData
-
Repeat same steps from 2 to 9 for test data also. ----> resulting finalTestData
-
Finally merge both finalTrainData & finalTestData using rbind.
-
grep to filter out to get only columns having mean and std, ofcourse other 2 columns as well (PersonId and Activity)
-
Use aggregate function to calculate means of all variables from 3 to 81 (first 2 beging PersonId and Activity)
-
write.table the output of means from step 13.