Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.4.0 introduced error for DataLoader which collates images #139

Open
jeremiedb opened this issue Jan 7, 2023 · 4 comments
Open

v0.4.0 introduced error for DataLoader which collates images #139

jeremiedb opened this issue Jan 7, 2023 · 4 comments

Comments

@jeremiedb
Copy link

jeremiedb commented Jan 7, 2023

Update from v0.3.1 to v0.4.0 resulted in a failure on a data loader:

ERROR: LoadError: DimensionMismatch: stack expects uniform slices, got axes(x) == (62:285, 18:241, 1:3) while first had (38:261, 18:241, 1:3)
Stacktrace:
  [1] try_yieldto(undo::typeof(Base.ensure_rescheduled))
    @ Base ./task.jl:871
  [2] wait()
    @ Base ./task.jl:931
  [3] wait(c::Base.GenericCondition{ReentrantLock})
    @ Base ./condition.jl:124
  [4] take_buffered(c::Channel{Any})
    @ Base ./channels.jl:416
  [5] take!(c::Channel{Any})
    @ Base ./channels.jl:410
  [6] iterate(#unused#::MLUtils.Loader, state::MLUtils.LoaderState)
    @ MLUtils ~/.julia/packages/MLUtils/KcBtS/src/parallel.jl:140
  [7] iterate(loader::MLUtils.Loader)
    @ MLUtils ~/.julia/packages/MLUtils/KcBtS/src/parallel.jl:132
  [8] iterate(e::DataLoader{ValContainer{Vector{String}, Vector{String}}, Random._GLOBAL_RNG, Val{true}})
    @ MLUtils ~/.julia/packages/MLUtils/KcBtS/src/eachobs.jl:173
  [9] eval_f(m::ResNet, data::DataLoader{ValContainer{Vector{String}, Vector{String}}, Random._GLOBAL_RNG, Val{true}})
    @ Main ~/github/ImageNetTrain.jl/resnet-optim.jl:161

Strangely, the message about the axes sizes seems legitimate.
The error message appears to come from https://github.com/JuliaLang/Compat.jl/blob/295c146528063385a0d89bc2be12a7f534052d82/src/Compat.jl#L610, which itself is called by _dim_stack: https://github.com/JuliaLang/julia/blob/de73c26fbff61d07a38c9653525b530a56630831/base/abstractarray.jl#L2847

The error can be reproduced with following script: https://github.com/jeremiedb/ImageNetTrain.jl/blob/main/experiments/loaders/test-loader-min.jl
Although it assumes imagenet data is available. I'll provide a more minimal reproducible example in case the above details don't already hint to the issue that came with v0.4.

It may tied to #119, though I wasn't clear whether or how the Loader for images would no longer be properly defined for MLUtils.

@jeremiedb jeremiedb changed the title v0.4.0 introduced error for v0.4.0 introduced error for DataLoader which collates images Jan 8, 2023
@jeremiedb
Copy link
Author

Here's a MWE:

using Images
using StatsBase: sample, shuffle
using DataAugmentation
using Flux
using TestImages

import Base: length, getindex
import Flux.MLUtils: getobs, getobs!

const im_size = (224, 224)

imgs = ["chelsea", "coffee"]

struct ImageContainer{T<:Vector}
    img::T
end

length(data::ImageContainer) = length(data.img)
tfm_train = DataAugmentation.compose(ScaleKeepAspect(im_size))

function getobs(data::ImageContainer, idx::Int)
    path = data.img[idx]
    # img = Images.load(path)
    img = testimage(path)
    img = apply(tfm_train, Image(img))
    img = itemdata(img)
    # img = permutedims(channelview(RGB.(itemdata(img))), (3, 2, 1))
    return img
end

data = ImageContainer(imgs)
deval1 = Flux.DataLoader(data, batchsize=2, collate = true, partial = false)

Incuding the line img = itemdata(img) result in the initialization of deval1 to crash. If line is commented, deval1 creation will work fine.

Although this may look at first glance as an Images' related issue, I think it is more tied to DataLoader since calling batch = getobs(data, 1); works fine and returns the image. So the getobs function can be evaluated successfully.
Also, if collate is set to false, it will also works fine: deval2 = Flux.DataLoader(data, batchsize=2, collate = false, partial = false)

@lorenzoh, would you have a take on this one?

@jeremiedb
Copy link
Author

(CatDogPanda) pkg> st
Project CatDogPanda v0.1.0
Status `C:\github\CatDogPanda\Project.toml`
  [336ed68f] CSV v0.10.9
  [052768ef] CUDA v3.12.1
  [88a5189c] DataAugmentation v0.2.11
  [587475ba] Flux v0.13.11
  [916415d5] Images v0.25.2
  [2913bbd2] StatsBase v0.33.21
  [5e47fb64] TestImages v1.7.1

@mcabbott
Copy link
Contributor

These axes represent images the same size, with offset indices:

julia> length.((62:285, 18:241, 1:3))
(224, 224, 3)

julia> length.((38:261, 18:241, 1:3))
(224, 224, 3)

I presume the previous version ignored offsets & made an Array, like the cat functions do at present.

Julia 1.9's stack instead takes offsets seriously, and propagates them to the output, hence demands equality. Something like stack(OffsetArrays.no_offset_view, images) would avoid this.

Flux won't work at all on arrays with offset indices. So there's some chance MLUtils should always remove them?

@jeremiedb
Copy link
Author

jeremiedb commented Jan 14, 2023

I think you're having the right diagnosis.
I just tried:

path = imgs[1]
_img = testimage(path)
_img = apply(tfm_train, Image(_img))
size(_img.data)
img = channelview(float32.(_img.data))
julia> typeof(img)
Base.ReinterpretArray{Float32, 3, RGB{Float32}, OffsetArrays.OffsetMatrix{RGB{Float32}, Matrix{RGB{Float32}}}, true}

Then, on this reinterpreted OffsetMatrix:

julia> Array(img)
ERROR: DimensionMismatch: axes must agree, got (Base.OneTo(3), Base.OneTo(224), Base.OneTo(224)) and (Base.OneTo(3), OffsetArrays.IdOffsetRange(values=2:225, indices=2:225), OffsetArrays.IdOffsetRange(values=59:282, indices=59:282))

Hover, collect(img) works fine.

The above is with Julia 1.8.4.
The same behavior is also observed on previous version of Setfield (pre v1) which I thought could have been in cause.

In short, using collect seems to be the proper way to get array through such image dataloader. For example:

tfm_train = DataAugmentation.compose(ScaleKeepAspect(im_size), CenterCrop(im_size))

function getobs(data::ImageContainer, idx::Int)
    path = data.img[idx]
    _img = testimage(path)
    _img = apply(tfm_train, Image(_img))
    img = collect(channelview(float32.(itemdata(_img))))
    return img
end

A caveat from the itemdata however is that it results in more allocation, 2.1Mb instead of 1.615 MiB on MLUtils v0.3.1 where collect could be omitted. Do you see a way to avoid this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants