DreamSim: Learning New Dimensions of

Human Visual Similarity using Synthetic Data

Supplementary Material

 

 


Dataset Examples

Examples of triplets from our dataset.

57415
16297
71617
36138
95017
10128
19336
61196
91828
85015
32162
89576
81818
34778
29755
51773
56046
19391
55550
98247
24264
55111
61480
35841

 


Nearest Neighbor Visualizations

Nearest neighbor searches in ImageNet-R and COCO

Ours = DreamSim.

LPIPS DISTS OpenCLIP DINO Ours
Input
Nearest Neighbors
LPIPS DISTS OpenCLIP DINO Ours
Input
Nearest Neighbors
LPIPS DISTS OpenCLIP DINO Ours
Input
Nearest Neighbors
LPIPS DISTS OpenCLIP DINO Ours
Input
Nearest Neighbors
LPIPS DISTS OpenCLIP DINO Ours
Input
Nearest Neighbors
LPIPS DISTS OpenCLIP DINO Ours
Input
Nearest Neighbors

 


Inversion Visualizations

Inversion visualization using optimization, deep image prior, and guided diffusion.

Ours = DreamSim.

Target DINO OpenCLIP Ensemble Ours
Run 1
Run 2
Run 3
Run 4
Run 5
Target DINO OpenCLIP Ensemble Ours
Run 1
Run 2
Run 3
Run 4
Run 5
Target DINO OpenCLIP Ensemble Ours
Run 1
Run 2
Run 3
Run 4
Run 5
Target DINO OpenCLIP Ensemble Ours
Run 1
Run 2
Run 3
Run 4
Run 5
Target DINO OpenCLIP Ensemble Ours
Run 1
Run 2
Run 3
Run 4
Run 5
Target DINO OpenCLIP Ensemble Ours
Run 1
Run 2
Run 3
Run 4
Run 5
Target DINO OpenCLIP Ensemble Ours
Run 1
Run 2
Run 3
Run 4
Run 5
Target DINO OpenCLIP Ensemble Ours
Run 1
Run 2
Run 3
Run 4
Run 5
Target DINO OpenCLIP Ensemble Ours
Run 1
Run 2
Run 3
Run 4
Run 5