Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about the input/output size #8

Open
Danee-wawawa opened this issue May 12, 2023 · 14 comments
Open

about the input/output size #8

Danee-wawawa opened this issue May 12, 2023 · 14 comments
Assignees
Labels
question Further information is requested

Comments

@Danee-wawawa
Copy link

Hi,thank you for your work.
Does the input size and output size support other sizes, such as 640512 or 512512? If possible, where the code needs to be modified?
Looking forward to your answer.

@usert5432
Copy link
Collaborator

usert5432 commented May 14, 2023

Hi @Danee-wawawa,

Does the input size and output size support other sizes, such as 640512 or 512512?

This depends on several factors. In the simplest case -- if your data is images, and you would like to perform the translation between the images of the same size (e.g. 512 x 512 -> 512 x 512), then this case is supported by uvcgan2.

If possible, where the code needs to be modified?

In a case, as I have described above, one would need to modify the data configuration of the training script. Taking male2female script as an example:

'datasets' : [
{
'dataset' : {
'name' : 'image-domain-hierarchy',
'domain' : domain,
'path' : 'celeba_hq_resized_lanczos',
},
'shape' : (3, 256, 256),
'transform_train' : [
'random-flip-horizontal',
],
'transform_test' : None,
} for domain in [ 'male', 'female' ]
],

One would need to modify shape parameter to match the desired shape, e.g. 'shape' : (3, 512, 512) .

If your case is more complicated, more modifications may be required. Please let me know if you have further questions.

@Danee-wawawa
Copy link
Author

Thank you for your reply. Now,512 x 512 -> 512 x 512 is OK and I want to try 640 x 512 -> 640 x 512. Does this situation require modifying the network structure?

@usert5432
Copy link
Collaborator

usert5432 commented May 15, 2023

I want to try 640 x 512 -> 640 x 512. Does this situation require modifying the network structure?

No, you do not need to modify the network structure for 640 x 512 images (modifying the shape parameter should be enough). In general, as long as your image dimensions are divisible by 16, you can use the default network structure.

With that said, it may be helpful to tune the network structure a bit to achieve the best performance, but it is not necessary.

@Danee-wawawa
Copy link
Author

OK, thank you~~

@Pudding-0503
Copy link

Pudding-0503 commented Jun 8, 2023

你好@usert5432,

人们需要修改shape参数以匹配所需的形状,例如'shape' : (3, 512, 512).

如果您的情况更复杂,则可能需要进行更多修改。如果您还有其他问题,请告诉我。

Sorry to bother you, I also have a question about image size. If my image shape is (3,512,512), then the

  'shape' : (3, 256, 256),

changed to

  'shape' : (3, 512, 512),

The following three lines:

  'transform_train' : [
    { 'name' : 'resize', 'size' : 286, },
    { 'name' : 'random-crop', 'size' : 256, },

Does it need to be changed accordingly to

  'transform_train' : [
     { 'name' : 'resize', 'size' : 512, },
     { 'name' : 'random-crop', 'size' : 256, },

Or what about other numbers?

@usert5432
Copy link
Collaborator

Hi @Pudding-0503,

The data transformations are heavily dependent on the dataset that you have. For instance, if you have a large dataset (>= 5k images). And, if the objects that you want to translate have approximately the same size. Then, perhaps, you do not need to apply any transformations at all (or limit them just to a random horizontal flip).

And, in general, I would suggest to start with only random-horizontal-flip transformation, e,g.

                'transform_train' : [
                    'random-flip-horizontal',
                ],

And see how the translation works. If it does not work, then adjust the network hyperparameters (as described in the README). And, if it still does not work, then add new transformations.

@Pudding-0503
Copy link

OK! I got it, thank you very much~~~

@Danee-wawawa
Copy link
Author

Hi, I want to try 640 x 512 -> 640 x 512. And I modify the data configuration of the training script as following:
c6e5ef293725835e8645de0315b8ed5
But I get the following error:
e26e28685f731d8b4157eb5cf7b89eb
How to solve this problem?
Looking forward to your answer.

@usert5432
Copy link
Collaborator

Hi @Danee-wawawa. Can you try switching the order of dimensions? That is, setting shape = (3, 512, 640) instead of (3, 640, 512).

@Danee-wawawa
Copy link
Author

It is OK, thank you~

@sophiatmu
Copy link

sophiatmu commented Dec 14, 2023

Hi, I want to try 14601080 ->14601080, but I got this:
image
image
image

How to solve this problem?

Thank you for your reply

@usert5432
Copy link
Collaborator

Hi @sophiatmu,

Unfortunately, I do not think uvcgan will work on images of size 1460 x 1080. It expects image dimensions to be divisible by 16, but neither 1460 nor 1080 are divisible.

@usert5432 usert5432 self-assigned this Dec 16, 2023
@usert5432 usert5432 added the question Further information is requested label Dec 16, 2023
@create-li
Copy link

Hi @Danee-wawawa. Can you try switching the order of dimensions? That is, setting instead of .shape = (3, 512, 640)``(3, 640, 512)

Excuse me, is it true that higher resolution results in better image translation and higher metrics

@usert5432
Copy link
Collaborator

Hello @create-li

Excuse me, is it true that higher resolution results in better image translation and higher metrics

Unfortunately, we have not tried training on the higher resolution images, so I am not sure. Perhaps, somebody in this thread could provide feedback on higher resolution training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants