You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Stack error in the data loader due to tensors with different shapes.
How to reproduce:
Run the preprocessing step python3 scripts/baseline/get_npy.py run 'your_path_to_spectrogram_npy' on the mood/theme subset, since the baseline.pth outputs 56 classes.
Run the train command python3 scripts/baseline/main.py --mode 'TRAIN'
Trying to solve:
The article says:
we only used a centered 29.1s audio segment
Which I believe would be the equivalent to getting the mel with melspectrograms.py setting full_audio to False. That yields a [96, 1366] tensor that is the shape needed to run inference in the baseline model.
Since the mels in the dataset were calculated over the whole duration of the audios, the data loader might need to center a [96, 1366] segment in the dataset's mels.
When trying to obtain a mel from an audio to see if getting the 29.1s segment would be equivalent to center a [96, 1366] segment in the dataset's mels, I obtained the same dimensions, but different values. For example, for the 00/13400.mp3 audio, the precomputed mel and the mel calculated with melspectrogram.py will have the dimentions [96, 9602]. But if you print both at [:,0] the dataset precomputed one will contain the following numbers:
The error
Stack error in the data loader due to tensors with different shapes.
How to reproduce:
python3 scripts/baseline/get_npy.py run 'your_path_to_spectrogram_npy'
on the mood/theme subset, since the baseline.pth outputs 56 classes.python3 scripts/baseline/main.py --mode 'TRAIN'
Trying to solve:
The article says:
Which I believe would be the equivalent to getting the mel with melspectrograms.py setting
full_audio
to False. That yields a [96, 1366] tensor that is the shape needed to run inference in the baseline model.Since the mels in the dataset were calculated over the whole duration of the audios, the data loader might need to center a [96, 1366] segment in the dataset's mels.
When trying to obtain a mel from an audio to see if getting the 29.1s segment would be equivalent to center a [96, 1366] segment in the dataset's mels, I obtained the same dimensions, but different values. For example, for the 00/13400.mp3 audio, the precomputed mel and the mel calculated with melspectrogram.py will have the dimentions [96, 9602]. But if you print both at [:,0] the dataset precomputed one will contain the following numbers:
[-69.5358, -64.7463, -61.8604, -59.8808, -58.1119, -58.2752, -58.9025,
-60.2660, -62.0527, -64.3706, -68.4771, -72.2208, -75.7047, -79.4953,
-85.4376, -85.6893, -81.9504, -80.0834, -79.7122, -82.1272, -89.4751,
-90.0000, -90.0000, -90.0000, -90.0000, -88.8482, -86.1220, -84.0110,
-81.6328, -81.6245, -82.9754, -83.6547, -85.0630, -88.5137, -90.0000,
-87.7471, -85.0853, -82.7995, -84.5712, -88.1776, -88.0879, -86.8838,
-89.5533, -90.0000, -84.0632, -81.3411, -83.6548, -87.9001, -90.0000,
-90.0000, -88.2064, -84.8365, -85.5288, -87.3742, -88.8410, -90.0000,
-90.0000, -85.1121, -83.0755, -86.6247, -90.0000, -89.6840, -87.7929,
-84.6036, -86.9026, -90.0000, -90.0000, -87.8175, -83.3707, -84.7766,
-90.0000, -90.0000, -90.0000, -90.0000, -90.0000, -88.1323, -90.0000,
-88.8589, -90.0000, -90.0000, -90.0000, -88.7473, -90.0000, -89.0149,
-90.0000, -90.0000, -90.0000, -90.0000, -90.0000, -90.0000, -88.6646,
-90.0000, -90.0000, -90.0000, -90.0000, -90.0000]
While the calculated with melspectrogram.py will be like:
[-139.0715, -129.4926, -123.7208, -119.7616, -116.2238, -116.5503,
-117.8051, -120.5321, -124.1054, -128.7413, -136.9542, -144.4415,
-151.4094, -158.9905, -170.8752, -171.3786, -163.9008, -160.1669,
-159.4244, -164.2545, -178.9503, -193.2552, -186.2103, -188.0788,
-188.0027, -177.6964, -172.2440, -168.0220, -163.2655, -163.2491,
-165.9508, -167.3093, -170.1259, -177.0275, -180.8245, -175.4943,
-170.1705, -165.5989, -169.1423, -176.3552, -176.1757, -173.7676,
-179.1065, -182.1857, -168.1263, -162.6822, -167.3096, -175.8002,
-185.6764, -189.3085, -176.4127, -169.6730, -171.0577, -174.7484,
-177.6820, -192.4283, -181.8572, -170.2243, -166.1510, -173.2494,
-181.5207, -179.3679, -175.5858, -169.2072, -173.8052, -189.5120,
-199.9228, -175.6349, -166.7414, -169.5531, -190.6465, -191.5059,
-186.6069, -193.5956, -188.5288, -176.2646, -181.7400, -177.7178,
-189.9011, -180.9200, -181.7761, -177.4945, -183.4301, -178.0298,
-189.3605, -186.7196, -189.7235, -185.6219, -188.4031, -185.2255,
-177.3292, -184.3699, -185.4904, -200.0000, -200.0000, -200.0000]
Also, there is a bug in the validation function, because the data loader returns 3 values not 2.
The text was updated successfully, but these errors were encountered: