High level summary of the data file and available variables.
value | value | ||
---|---|---|---|
Size (KB) | 47.4 | ||
Observations | 891 | Numeric Variables | 7 |
Variables | 9 | Non-Numeric Variables | 2 |
Note that numeric data may be any one of double, integer, complex, logical or numeric
Information on each available variable.
type | mean | sd | missing data | missing data (%) | |
---|---|---|---|---|---|
PassengerId | integer | 446 | 257.354 | 0 | 0 % |
Survived | integer | 0.384 | 0.487 | 0 | 0 % |
Pclass | integer | 2.309 | 0.836 | 0 | 0 % |
Sex | character | - | - | 0 | 0 % |
Age | double | 29.699 | 14.526 | 177 | 19.865 % |
SibSp | integer | 0.523 | 1.103 | 0 | 0 % |
Parch | integer | 0.382 | 0.806 | 0 | 0 % |
Fare | double | 32.204 | 49.693 | 0 | 0 % |
Embarked | character | - | - | 0 | 0 % |
Understanding the distribution of numeric data is useful for informing data cleaning and modelling. Numeric data is assumed to be continuous for the creation of these distributions.
Categorical data is explored through the frequencies of occurrence of each category.
Linear correlation between variables, yields values between 1, -1. 1 and -1 correspond to perfect positive and negative relationships respectively, while values close to zero suggest no relationship between the variable pair.
Scatter plots for each pair of variables.
Scatter plots and regressions for the three strongest pairwise correlations.