Week 1 - Univariate Analyses Assignment
Bhupendra Mishra
Monday, July 27, 2015
a: Generate a random sample of 500 observations from the
Ecommerce data using R. Save as a dataframe.
ecommerce <- read.delim("C:/Users/Bhupendra Mishra/Desktop/donotbackup/BridgeSchoolMgmt/Bridge
School Mgmt/Module2/ecommerce.txt")
ecomm_samp<-ecommerce[sample(1:nrow(ecommerce),500),]
ecomm_samp<-ecommerce[sample(1:nrow(ecommerce),500),]
b.
Generate univariate profiles of the data, using
the summary() function
attach(ecomm_samp)
summary(ecomm_samp)
summary(ecomm_samp)
## churn_status session_length_seconds
session_count event_count
## Churned:227 Min. : 0 Min. : 1.00 Min. : 1.00
## Stayed :273 1st Qu.: 1497 1st Qu.: 4.00 1st Qu.: 40.25
## Median : 7549 Median : 16.50 Median : 200.00
## Mean : 30605 Mean : 62.92 Mean : 623.54
## 3rd Qu.: 32575 3rd Qu.: 80.50 3rd Qu.: 746.25
## Max. :616183 Max. :695.00 Max. :10931.00
## closed_session_event_count open_session_event_count
## Min. : 0.0 Min. : 0.0
## 1st Qu.: 6.0 1st Qu.: 6.0
## Median : 28.0 Median : 26.5
## Mean : 106.2 Mean : 105.8
## 3rd Qu.: 127.5 3rd Qu.: 128.5
## Max. :1717.0 Max. :1714.0
## quest_completed_event_count store_purchase_event_count active_days
## Min. : 0.00 Min. : 0.000 Min. : 1.00
## 1st Qu.: 4.00 1st Qu.: 0.000 1st Qu.: 2.75
## Median : 21.00 Median : 0.000 Median : 7.00
## Mean : 127.51 Mean : 5.126 Mean :15.13
## 3rd Qu.: 79.75 3rd Qu.: 3.000 3rd Qu.:24.00
## Max. :4419.00 Max. :243.000 Max. :55.00
## Churned:227 Min. : 0 Min. : 1.00 Min. : 1.00
## Stayed :273 1st Qu.: 1497 1st Qu.: 4.00 1st Qu.: 40.25
## Median : 7549 Median : 16.50 Median : 200.00
## Mean : 30605 Mean : 62.92 Mean : 623.54
## 3rd Qu.: 32575 3rd Qu.: 80.50 3rd Qu.: 746.25
## Max. :616183 Max. :695.00 Max. :10931.00
## closed_session_event_count open_session_event_count
## Min. : 0.0 Min. : 0.0
## 1st Qu.: 6.0 1st Qu.: 6.0
## Median : 28.0 Median : 26.5
## Mean : 106.2 Mean : 105.8
## 3rd Qu.: 127.5 3rd Qu.: 128.5
## Max. :1717.0 Max. :1714.0
## quest_completed_event_count store_purchase_event_count active_days
## Min. : 0.00 Min. : 0.000 Min. : 1.00
## 1st Qu.: 4.00 1st Qu.: 0.000 1st Qu.: 2.75
## Median : 21.00 Median : 0.000 Median : 7.00
## Mean : 127.51 Mean : 5.126 Mean :15.13
## 3rd Qu.: 79.75 3rd Qu.: 3.000 3rd Qu.:24.00
## Max. :4419.00 Max. :243.000 Max. :55.00
Inferences:
- Categorical variable is
churn_status and rest are numerical variable
- Churn status of customer is
almost 50:50
- Based on above summary we
can see data are not normally distributed.
- Mean is greater than median
for every variable hence data are positively skew.
c.
Generate pairwise correlation plots of numeric variables
and bar charts of categorical or factor variables.
pairs(~session_length_seconds+session_count+closed_session_event_count+open_session_event_count+event_count+quest_completed_event_count+store_purchase_event_count+active_days,
main="Pair
wise corelation plot")
Inference:
Positive correlation exist among all variable
barplot(table(churn_status))
vars<-c("session_length_seconds","session_count","closed_session_event_count", "open_session_event_count", "event_count","quest_completed_event_count","store_purchase_event_count","active_days" )
cor(ecomm_samp[vars])
cor(ecomm_samp[vars])
##
session_length_seconds session_count
## session_length_seconds 1.0000000 0.8503394
## session_count 0.8503394 1.0000000
## closed_session_event_count 0.9430646 0.9473013
## open_session_event_count 0.9418005 0.9474239
## event_count 0.9596580 0.8924715
## quest_completed_event_count 0.8380808 0.6582946
## store_purchase_event_count 0.5607137 0.4329665
## active_days 0.6299118 0.8381357
## closed_session_event_count
## session_length_seconds 0.9430646
## session_count 0.9473013
## closed_session_event_count 1.0000000
## open_session_event_count 0.9998514
## event_count 0.9574356
## quest_completed_event_count 0.7595074
## store_purchase_event_count 0.5387210
## active_days 0.7450125
## open_session_event_count event_count
## session_length_seconds 0.9418005 0.9596580
## session_count 0.9474239 0.8924715
## closed_session_event_count 0.9998514 0.9574356
## open_session_event_count 1.0000000 0.9578605
## event_count 0.9578605 1.0000000
## quest_completed_event_count 0.7605956 0.8848637
## store_purchase_event_count 0.5372247 0.5439314
## active_days 0.7434909 0.6958241
## quest_completed_event_count
## session_length_seconds 0.8380808
## session_count 0.6582946
## closed_session_event_count 0.7595074
## open_session_event_count 0.7605956
## event_count 0.8848637
## quest_completed_event_count 1.0000000
## store_purchase_event_count 0.4610465
## active_days 0.4731775
## store_purchase_event_count active_days
## session_length_seconds 0.5607137 0.6299118
## session_count 0.4329665 0.8381357
## closed_session_event_count 0.5387210 0.7450125
## open_session_event_count 0.5372247 0.7434909
## event_count 0.5439314 0.6958241
## quest_completed_event_count 0.4610465 0.4731775
## store_purchase_event_count 1.0000000 0.3898231
## active_days 0.3898231 1.0000000
## session_length_seconds 1.0000000 0.8503394
## session_count 0.8503394 1.0000000
## closed_session_event_count 0.9430646 0.9473013
## open_session_event_count 0.9418005 0.9474239
## event_count 0.9596580 0.8924715
## quest_completed_event_count 0.8380808 0.6582946
## store_purchase_event_count 0.5607137 0.4329665
## active_days 0.6299118 0.8381357
## closed_session_event_count
## session_length_seconds 0.9430646
## session_count 0.9473013
## closed_session_event_count 1.0000000
## open_session_event_count 0.9998514
## event_count 0.9574356
## quest_completed_event_count 0.7595074
## store_purchase_event_count 0.5387210
## active_days 0.7450125
## open_session_event_count event_count
## session_length_seconds 0.9418005 0.9596580
## session_count 0.9474239 0.8924715
## closed_session_event_count 0.9998514 0.9574356
## open_session_event_count 1.0000000 0.9578605
## event_count 0.9578605 1.0000000
## quest_completed_event_count 0.7605956 0.8848637
## store_purchase_event_count 0.5372247 0.5439314
## active_days 0.7434909 0.6958241
## quest_completed_event_count
## session_length_seconds 0.8380808
## session_count 0.6582946
## closed_session_event_count 0.7595074
## open_session_event_count 0.7605956
## event_count 0.8848637
## quest_completed_event_count 1.0000000
## store_purchase_event_count 0.4610465
## active_days 0.4731775
## store_purchase_event_count active_days
## session_length_seconds 0.5607137 0.6299118
## session_count 0.4329665 0.8381357
## closed_session_event_count 0.5387210 0.7450125
## open_session_event_count 0.5372247 0.7434909
## event_count 0.5439314 0.6958241
## quest_completed_event_count 0.4610465 0.4731775
## store_purchase_event_count 1.0000000 0.3898231
## active_days 0.3898231 1.0000000
require(car)
scatterplotMatrix(ecomm_samp[vars])
Inferences:
- Green line shows
the actual regression line where as red line is best possible relation with
some interval