Projections

Projections reduce high-dimensional data to 1D, 2D, or 3D for visualization.

Available Projection Types

VectorScope supports multiple projection types organized by output dimensionality:

1D Views (Feature Distribution)

Density - Distribution view with KDE curves or histogram
Box Plot - Distribution by class with quartiles and outliers
Violin - Distribution with density shape and box plot overlay

2D Views (Scatter Plots)

PCA - Principal Component Analysis (fast, linear, interpretable)
t-SNE - t-distributed Stochastic Neighbor Embedding (non-linear, cluster-focused)
UMAP - Uniform Manifold Approximation and Projection (non-linear, preserves structure)
Custom Axes - Project onto user-defined axes (e.g., barycenter directions)
Direct Axes - Use raw dimension values directly

3D Views (3D Scatter Plots)

PCA 3D - PCA with three principal components
Custom Axes 3D - Project onto three user-defined axes
Direct Axes 3D - Three raw dimensions as X, Y, Z axes

Creating Projections

From the Graph Editor:

Select a layer node
In the Config Panel, find “Add View”
Enter an optional name
Select the projection type
Click “Add View”

The projection is created and you can view it in the View Editor.

PCA (Principal Component Analysis)

PCA finds the directions of maximum variance in your data and projects onto them.

Advantages:

Fast computation
Deterministic (same result every time)
Preserves global structure
Components are interpretable

Parameters:

Components - Which principal components to use for X and Y axes (default: PC1, PC2)

When to use:

For quick exploration
When you want to see the directions of maximum variance
For large datasets where t-SNE is too slow

Configuring PCA

In the View Editor:

Select your PCA view
Use the X Axis and Y Axis dropdowns to choose different components
Click “Apply” to recompute

Try PC1 vs PC3, or PC2 vs PC3 to see different aspects of your data.

t-SNE (t-distributed Stochastic Neighbor Embedding)

t-SNE is a non-linear technique that emphasizes local structure and cluster separation.

Advantages:

Excellent at revealing clusters
Preserves local neighborhoods
Good for high-dimensional data

Disadvantages:

Slow for large datasets
Non-deterministic (different results each run)
Global distances are not meaningful
Sensitive to parameters

Parameters:

Perplexity - Balance between local and global aspects (5-50, default: 30)
Iterations - Number of optimization steps (250-2000, default: 1000)

When to use:

For exploring cluster structure
When you have time for computation
For datasets with well-defined local neighborhoods

Configuring t-SNE

In the View Editor:

Select your t-SNE view
Adjust the Perplexity slider - Lower values emphasize local structure - Higher values capture more global structure
Adjust the Iterations slider - More iterations = better convergence but slower
Click “Recompute”

Note

t-SNE creates a new random layout each time. The random_seed is saved with your session for reproducibility.

UMAP (Uniform Manifold Approximation and Projection)

UMAP is a modern non-linear dimensionality reduction technique that preserves both local and global structure.

Advantages:

Faster than t-SNE for large datasets
Better preserves global structure than t-SNE
More consistent results across runs
Good cluster separation

Parameters:

n_neighbors - Number of neighbors for local structure (default: 15) - Lower values emphasize local structure - Higher values capture more global patterns
min_dist - Minimum distance between points in embedding (default: 0.1) - Lower values create tighter clusters - Higher values spread points more evenly
spread - Scale of embedded points (default: 1.0)
metric - Distance metric (default: ‘euclidean’)

When to use:

For large datasets where t-SNE is too slow
When you need consistent, reproducible results
To see both local clusters and global relationships

Custom Axes

Projects data onto user-defined axes, typically created from barycenters of class clusters. This allows you to visualize data along semantically meaningful directions.

Parameters:

X Axis - First custom axis (select from defined axes)
Y Axis - Second custom axis (select from defined axes)
Mode - Projection mode:
- Oblique (default): Projects onto the closest point in the plane spanned by the axes
- Affine: Uses change of basis for exact coefficients
Flip X/Y - Negate the direction of each axis

Creating Custom Axes:

Before using Custom Axes view, you need to define axes:

Go to the Annotations panel
Select two points (e.g., class barycenters)
Click “Create Axis from Selection”
Name the axis (e.g., “setosa→virginica”)

Projection Modes:

The two modes compute coordinates differently:

Oblique: Finds (α, β) such that α*v1 + β*v2 is the closest point to x. Good for: Consistent visualization across views.
Affine: Finds exact coefficients in the change of basis x = c1*v1 + c2*v2 + … Good for: Mathematical precision, matching with Custom Affine transformation.

When to use:

To visualize data along semantically meaningful directions
To compare how different classes relate to specific concepts
When PCA axes don’t align with interpretable directions

Direct Axes

Direct axes view shows raw dimension values without any transformation.

Parameters:

dim_x - Which dimension to use for X axis (default: 0)
dim_y - Which dimension to use for Y axis (default: 1)
dim_z - Which dimension to use for Z axis (3D only, default: 2)

When to use:

To inspect specific dimensions directly
When you know which features are most important
For sanity checking data before transformations

Density View

Displays the distribution of values for a single dimension using KDE (kernel density estimation) or histogram visualization.

Parameters:

dim - Which dimension to display (default: 0)
bins - Number of bins for histogram mode (default: 30)
kde - Whether to show KDE curves (default: true)

Display modes:

KDE (default) - Shows smooth density curves for each class
Histogram - Shows binned counts with overlapping bars

When to use:

To understand the distribution of a single feature
To identify outliers
To compare distributions across classes (by color)
To see how well a feature separates classes

Box Plot View

Displays the distribution of values for a single dimension, grouped by class.

Parameters:

dim - Which dimension to display (default: 0)

Display:

Shows box-and-whisker plots for each class, including:

Median (center line)
Interquartile range (box)
Whiskers (1.5x IQR)
Outliers (individual points)

When to use:

To compare feature distributions across classes
To identify which features separate classes
To spot outliers within each class

Violin View

Displays the distribution of values for a single dimension using violin plots, which combine box plots with KDE density curves.

Parameters:

dim - Which dimension to display (default: 0)

Display:

Shows violin plots for each class, including:

Density shape (the “violin” curves)
Box plot overlay (quartiles)
Mean line

When to use:

When you want both box plot statistics and density visualization
To compare distribution shapes across classes
To see bimodal or multimodal distributions within classes

3D Projections

PCA, Custom Axes, and Direct Axes support 3D output for exploring data in three dimensions.

To create a 3D view:

Select a layer node in the Graph Editor
Click the “+” button to add a view
Choose from the 3D category: “PCA 3D”, “Custom Axes 3D”, or “Direct Axes 3D”
The 3D scatter plot will render in the View Editor

Available 3D projections:

PCA 3D - Uses PC1, PC2, and PC3 as X, Y, Z axes
Custom Axes 3D - Uses three user-defined custom axes as X, Y, Z
Direct Axes 3D - Choose any three raw dimensions for X, Y, Z

Interacting with 3D views:

Drag to rotate the view
Scroll to zoom in/out
Shift+drag to pan
Use axis range sliders to focus on regions of interest

When to use 3D:

When two dimensions don’t capture enough variance
To explore relationships between three features
When clusters overlap in 2D but separate in 3D

Custom Axes 3D

The 3D version of Custom Axes requires three user-defined axes for all three dimensions. This is useful for exploring data along three semantically meaningful directions simultaneously.

Parameters:

X Axis - First custom axis
Y Axis - Second custom axis
Z Axis - Third custom axis (required)
Mode - Projection mode (Oblique or Affine)
Flip X/Y/Z - Negate the direction of each axis
Center - Use mean (default) or a virtual point as origin

Creating axes for 3D:

You need at least three custom axes defined before using Custom Axes 3D:

Create class barycenters from selections (e.g., for Iris: setosa, versicolor, virginica)
Create axes between pairs of barycenters
Select all three axes in the Custom Axes 3D configuration

Example workflow:

Load the Iris dataset
Create selections for each class
Create barycenters for each selection
Create axes: setosa→versicolor, setosa→virginica, versicolor→virginica
Add a “Custom Axes 3D” view
Assign axes to X, Y, Z and click Apply

3D axis visualization:

Custom axes are displayed as colored arrow lines in the 3D scatter plot, showing the direction from point A to point B for each axis.

Comparing Projections

To compare different projections:

Create multiple projections from the same layer
Switch to the Viewports view
Add viewports and assign different projections
Selection is synchronized across viewports

This lets you see how the same points appear in PCA vs t-SNE, or with different parameters.

Understanding the Visualization

Point Colors

Points are colored by their metadata:

cluster - For synthetic data, the assigned cluster
class - For sklearn datasets, the target class
Label-based - For CSV data, unique labels get unique colors

Selection

Drag to select multiple points
Selected points are highlighted in all viewports
Selection count shows in the toolbar
Click “Clear Selection” to deselect

Best Practices

Start with PCA - It’s fast and gives a good overview
Try UMAP next - Good balance of speed and cluster quality
Use t-SNE for publication - When you need the best cluster separation
Try different components - PC1/PC2 isn’t always the most interesting view
Use Direct Axes to inspect raw features - Sanity check before complex projections
Use Density/Box Plot/Violin for feature analysis - Understand distributions by class
Compare views - The same data can look very different in different projections
Save your session - Preserve good parameter settings for reproducibility