Loading Data

VectorScope supports multiple data formats and sources.

Supported File Formats

CSV Files

Comma-separated values with a header row. VectorScope automatically detects:

  • Numeric columns - Used as vector dimensions

  • String columns - Used as labels

Example CSV:

id,species,sepal_length,sepal_width,petal_length,petal_width
1,setosa,5.1,3.5,1.4,0.2
2,setosa,4.9,3.0,1.4,0.2
...

When you load a CSV, VectorScope:

  1. Parses the header row

  2. Detects which columns are numeric

  3. Defaults to using all numeric columns as features

  4. Uses the first string column as labels

You can reconfigure columns in the Config Panel after loading.

NumPy Files (.npy, .npz)

NPY files contain a single 2D array of shape (n_points, n_dimensions).

NPZ files should contain one of these array names:

  • vectors

  • data

  • embeddings

  • X or x

If none of these exist, the first array is used.

For NumPy files, columns are named dim_0, dim_1, etc., and you can configure them after loading.

Loading Methods

Load Data Button

  1. Click Load Data in the toolbar

  2. Select a CSV, NPY, or NPZ file

  3. The layer is created and selected in the Graph Editor

Create Synthetic

Generates random clustered data for testing:

  • 1000 points by default

  • 30 dimensions

  • 5 clusters

Load Dataset

Built-in sklearn datasets:

  • Iris - 150 samples, 4 features, 3 classes

  • Wine - 178 samples, 13 features, 3 classes

  • Breast Cancer - 569 samples, 30 features, 2 classes

  • Digits - 1797 samples, 64 features, 10 classes

  • Diabetes - 442 samples, 10 features, regression target

  • Linnerud - 20 samples, 3 features

These datasets include proper feature names and class labels.

Open Session

Load a previously saved VectorScope session (JSON + NPZ files).

Column Configuration

For source layers (not derived), you can configure which columns to use:

  1. Select the layer in the Graph Editor

  2. In the Config Panel, you’ll see:

    • Label Column dropdown - Which column provides point labels

    • Feature Columns checkboxes - Which columns to use as vector dimensions

  3. Click Apply Column Configuration to update

This is especially useful for CSV files where you may want to:

  • Exclude certain columns (like IDs)

  • Use a specific column for labels/coloring

  • Remove non-feature columns from the vector

Note

Changing column configuration recomputes all downstream projections and transformations.

Data Limits

VectorScope stores all data in memory. Consider these practical limits:

  • Points: Tens of thousands work well; hundreds of thousands may be slow

  • Dimensions: Hundreds is fine; thousands may slow down projections

  • t-SNE: Especially slow for large datasets; consider using PCA first