Comparison of WOVEN and DNN performance

This repository includes comparisons of WOVEN and the DNN with human behavioral performance across three tasks: (1) stiffness estimation, where humans perform well; (2) mass estimation, where humans perform at chance; and (3) a novel prediction of the effect of wind on mass estimation, which is velidated by human behavioral results. The paper detailing these findings is currently under review. For a copy of the manuscript, please email me at [wenyan.bi@yale.edu].

Evaluation 1: 2AFC stiffness matching task

Task

On each trial, participants view a triad of cloth animations: a target cloth animation at the top center and two test videos at the bottom. The target video is referred to as the "target item," one of the bottom videos shares the same stiffness value as the target (the "match item"), and the other has a different stiffness value (the "distractor item"). The mass values for all three videos are randomly chosen from a predefined set of four possible values: [4^-1, 4^-0.5,4⁰, 4^0.5]. Participants were instructed to choose the test videos that corresponded to the target cloth in terms of their stiffness values.

Results

When WOVEN and DNN models are calibrated to match the average accuracy of human participants, WOVEN explains a greater portion of the variance in human behavioral performance.

Evaluation 2: 2AFC mass matching task

Task

The same procedure as the stiffness matching task, but participants were asked to match mass in this experiment.

Results

WOVEN also generalizes better to explain human behavioral performance in a new task --- the mass matching task.

Evaluation 3: WOVEN predicts novel effects of mass on mass estimation.

Woven's joint inference of mass, stiffness, and wind strength predicts a differential effect of wind strength on physical properties. Specifically, it suggests that stiffness estimation is unaffected by wind strength, whereas mass estimation is influenced by it. When Woven infers stronger winds, it tends to perceive the cloth as lighter.

We predict and confirm that humans more accurately match mass in the trials that exclude the wind scenario, relative to those that include the wind scenario. In particular, the effect of the ground truth mass difference (quantified using the regression model on top, where stiffDiff= |s_Match −s_Target|−|s_Distractor −s_Target| and massDiff = |m_Match −m_Distractor |) was significantly greater in the wind-excluding trials than in the wind-including trials. Woven showed qualitatively identical patterns of perceptual constancy as humans, but this effect was reversed for the DNN.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
README.md		README.md
mass_all.png		mass_all.png
mass_detail.png		mass_detail.png
perceptual_const.png		perceptual_const.png
perceptual_mass.png		perceptual_mass.png
perceptual_stiff.png		perceptual_stiff.png
stiffness_all.png		stiffness_all.png
stiffness_detail.png		stiffness_detail.png
task.gif		task.gif
woven_pred.png		woven_pred.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparison of WOVEN and DNN performance

Evaluation 1: 2AFC stiffness matching task

Task

Results

When WOVEN and DNN models are calibrated to match the average accuracy of human participants, WOVEN explains a greater portion of the variance in human behavioral performance.

Evaluation 2: 2AFC mass matching task

Task

Results

WOVEN also generalizes better to explain human behavioral performance in a new task --- the mass matching task.

Evaluation 3: WOVEN predicts novel effects of mass on mass estimation.

About

Releases

Packages

CNCLgithub/model-perf-eval

Folders and files

Latest commit

History

Repository files navigation

Comparison of WOVEN and DNN performance

Evaluation 1: 2AFC stiffness matching task

Task

Results

When WOVEN and DNN models are calibrated to match the average accuracy of human participants, WOVEN explains a greater portion of the variance in human behavioral performance.

Evaluation 2: 2AFC mass matching task

Task

Results

WOVEN also generalizes better to explain human behavioral performance in a new task --- the mass matching task.

Evaluation 3: WOVEN predicts novel effects of mass on mass estimation.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages