Vector Search is Broken
for Structured Data.

Standard embeddings flatten JSON into text, making critical field changes invisible to your vector database. StructVector separates "Evil Twins" with 99.9% accuracy.

$ pip install structvector

Class: SAFE

Anchor Event

{
  "user_id": "u_8921",
  "amount": 450.00,
  "currency": "USD",
  "signals": {
    "ip_risk": "low",
    "kyc": true
  },
  "meta": { ...noise... }
}

Class: FRAUD

Evil Twin

{
  "user_id": "u_8921",
  "amount": 450.00,
  "currency": "USD",
  "signals": {
    "ip_risk": "low",
    "kyc": false
  },
  "meta": { ...noise... }
}

OPENAI EMBEDDING (text-embedding-3) 99.5% Similarity (Overlap)

STRUCTVECTOR EMBEDDING 82.1% Similarity (Separated)

Semantic Geometry

We used t-SNE to visualize the embedding space of 100 "Safe" events and their corresponding "Fraud" clones.

While standard models collapse these distinct events into a single point, StructVector learns a Structure-Aware Manifold. It physically pushes urgent field changes away from the baseline, creating distinct clusters for anomaly detection.

2ms

Latency (CPU)

Parameters

1536d

Output Dim

94%

Twin Separation

t-SNE Visualization of StructVector separation

fig 1. StructVector Manifold Separation

Integration

pipeline.py

import structvector as sv

# 1. Define your schema priorities
schema = sv.Schema({
    "prediction.label": "urgent",       # 10x weight
    "transaction.amount": "high",       # 5x weight
    "meta.processingTime": "none"       # Ignored
})

# 2. Initialize the adapter (runs on CPU)
embedder = sv.StructEmbedder(model_path="fraud_v1.pth")

# 3. Generate structure-aware vectors
vector = embedder.encode({
    "id": "tx_123",
    "amount": 5000,
    "label": "fraud"
}, schema)

print(vector.shape) # (512,)

Runs on any hardware. No GPU required.

View on GitHub