Summary

Data file information

High level summary of the data file and available variables.

value value
Size (KB) 47.4
Observations 891 Numeric Variables 7
Variables 9 Non-Numeric Variables 2

Note that numeric data may be any one of double, integer, complex, logical or numeric

Variable Details

Information on each available variable.

type mean sd missing data missing data (%)
PassengerId integer 446 257.354 0 0 %
Survived integer 0.384 0.487 0 0 %
Pclass integer 2.309 0.836 0 0 %
Sex character - - 0 0 %
Age double 29.699 14.526 177 19.865 %
SibSp integer 0.523 1.103 0 0 %
Parch integer 0.382 0.806 0 0 %
Fare double 32.204 49.693 0 0 %
Embarked character - - 0 0 %

Distributions of variables

Numeric distributions

Understanding the distribution of numeric data is useful for informing data cleaning and modelling. Numeric data is assumed to be continuous for the creation of these distributions.

Categorical distributions

Categorical data is explored through the frequencies of occurrence of each category.

Correlations between variables

Correlation Matrix

Linear correlation between variables, yields values between 1, -1. 1 and -1 correspond to perfect positive and negative relationships respectively, while values close to zero suggest no relationship between the variable pair.

Pairwise Matrix

Scatter plots for each pair of variables.

Regressions

Scatter plots and regressions for the three strongest pairwise correlations.