pca_numpy

compas.numerical.pca_numpy(data)[source]

Compute the principle components of a set of data points.

Parameters:
datalist

A list of m observations, measuring n variables. For example, if the data are points in 2D space, the data parameter should contain m nested lists of 2 variables, the x and y coordinates.

Returns:
tuple
  • The mean of the data points.

  • The principle directions. The number of principle directions is equal to the dimensionality of the data. For example, if the data points are locations in 3D space, three principle components will be returned. If the data points are locations in 2D space, only two principle components will be returned.

  • The spread of the data along the principle directions.

Raises:
ValueError

If the number of observations is smaller than the number of measured variables.

Notes

PCA of a dataset finds the directions along which the variance of the data is largest, i.e. the directions along which the data is most spread out.

Examples

>>>