Multivariate Normality Testing via 3D Projections

This tool explores a projection-based method for assessing multivariate normality. It compares the distribution of ellipsoid fit errors (Mean Squared Error - MSE) from random 3D projections of a test dataset against the distribution obtained from a reference Multivariate Normal (MVN) dataset.

Controls & Configuration

3D Projection Visualization (Last Tested Projection)

Reference (Normal) MSE Distribution

Test (Non-Normal/Uploaded) MSE Dist.

Methodology Explained

This method leverages the property that **any linear projection of a multivariate normal (MVN) distribution is itself normal**. Departures from normality in the high-dimensional space often manifest as non-ellipsoidal shapes in lower-dimensional projections.

Workflow:

  1. Generate or upload N-dimensional data. The "Normal" dataset serves as a **reference** for comparison.
  2. For a chosen dataset, repeatedly perform the following 'test':
    • Randomly sample 3 dimensions.
    • Project the N-dimensional data onto these 3 dimensions.
    • Fit an ellipsoid to the 3D projected points (based on the sample covariance matrix).
    • Calculate a goodness-of-fit metric: the Mean Squared Error (MSE) between the points and the fitted ellipsoid surface.
  3. Collect the MSE values from many tests (e.g., 100+).
  4. Compare the **distribution** of MSE values from the test dataset against the reference Normal dataset's MSE distribution using visualization (histograms) and a statistical test (Mann-Whitney U).

Statistical Rationale & Advantages:

  • **Dimensionality Reduction:** Avoids the "curse of dimensionality" inherent in many high-dimensional tests.
  • **Sensitivity:** Can detect various types of departures from multivariate normality (e.g., skewness, multimodality, non-linear dependencies) that affect projection shapes.
  • **Comparison-Based:** The conclusion relies on comparing the test distribution to a known normal reference, rather than an absolute threshold.
  • **Visualization:** Provides intuitive visual feedback through the 3D plot and MSE histograms.

Generated Dataset Details:

  • Normal: Generated from an N-dimensional standard MVN distribution.
  • Non-Normal: Generated from a mixture of two MVN distributions with slightly different means and covariances to introduce non-normality.
  • Sample counts > 5000 may impact browser performance.

User Guide:

  • Use "Generate Datasets" first (adjust N/Samples if needed).
  • Run "Test Normal" to establish the reference MSE distribution.
  • Run "Test Non-Normal" or Upload and Test your own data.
  • Use "Compare & Conclude" for statistical comparison.
  • The Mann-Whitney U test assesses if the two MSE distributions (Reference vs. Test) are significantly different. A low p-value (< 0.05) suggests a significant difference, implying the test data deviates from the reference normal.
  • "New Projection" shows another random 3D view of the *last dataset tested*.

Reference (Normal) Results

Mean MSE

-

Std Dev MSE

-

Test (Non-Normal/Uploaded) Results

Mean MSE

-

Std Dev MSE

-

Calculation Details