-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return of the loop vectorization compiler issue using gcc 11.2.0 on Windows MSYS2 #6384
Comments
I can not confirm the green cast, because my msys2 gcc 11.2 RT builds crash as soon as I open a raw :-( |
I cannot reproduce any longer... not sure what was going on. Closing for now. |
@Thanatomanic I already work on narrowing it... Edit: It indeed looks like the old gcc issue, but this time with doubles instead of floats. Maybe they fixed it only for single precision floats... or they reinvented the bug because their regression test did only cover single precision float, who knows. Anyway, I'm on it... |
@Thanatomanic Applying this silly patch solves the issue (not that we want to apply it) and clearly shows, that there's an issue in gcc 11.2 diff --git a/rtengine/gauss.cc b/rtengine/gauss.cc
index 99201a860..a27fc5a9c 100644
--- a/rtengine/gauss.cc
+++ b/rtengine/gauss.cc
@@ -19,7 +19,7 @@
#include <cmath>
#include <cstdlib>
#include <cstring>
-
+#include <iostream>
#include "gauss.h"
#include "boxblur.h"
@@ -651,7 +651,13 @@ template<class T> void gaussHorizontal (T** src, T** dst, const int W, const int
for (int j = 0; j < 3; j++) {
M[i][j] /= (1.0 + b1 - b2 + b3) * (1.0 + b2 + (b1 - b3) * b3);
}
-
+#pragma omp single
+{
+ for (int i = 0; i < 3; i++)
+ for (int j = 0; j < 3; j++) {
+ std::cout << M[i][j] << std::endl;
+ }
+}
double temp2[W] ALIGNED16;
#ifdef _OPENMP
|
@Thanatomanic You need to enable raw ca correction with avoid color shift to trigger the bug |
It's this loop, which causes the issue. It's (wrong) vectorized in gcc 11.2, but (correctly) not vectorized in gcc 10.3 https://github.com/Beep6581/RawTherapee/blob/dev/rtengine/gauss.cc#L667 Here's a code snippet to check with godbolt.org...
|
@heckflosse I've temporarily patched |
@Thanatomanic please push |
* Gui improvments * Several improvments GUI Jz algo * Change function La for lightess Jz * SH jzazbz first * enable Jz SH * Clean code * Disabled Munsell correction when Jz * Change tooltip and Cam16 Munsell * GUI for CzHz and HzHz curves * Enable curves Hz(Hz) Cz(Hz) * Improve Cz chroma * Jz100 reference refine * Change limit Jz100 * Refine link between jz100 and peak adaptation * Improve GUI * Various improvment PQ PU gamut * Change defaults settings * forgotten PL in gamutjz * Small changes and comment * Change gamujz parameter * disabled gamut Jz too slow * Jzazbz curve Jz(Hz) * reenable gamutjz * small changes * Change tooltip * Change labels tooltips * Jzazbz only on advanced mode * GUI improvments * Change tooltip * Change default values and tooltip * Added tooltip Jz * Disabled Jz gamut * Change gamma color and light - remove exposure * Gamma for exposure and DR * gamma Sharp * Gamma vibrance * gamma optimizations * Change tooltips * Optimization PQ * LA GUI for tone curve Ciecam * LA ciecam Enable curve lightness - brightness * LA ciecam GUI color curve * LA ciecam enable color curve * Change tooltip and default values * Enable Jz curve * Enable Cz(Cz) curve * Enable Cz(Jz) curve * Added Log encoding to ciecam * Improvment algorithm remapping * Reenable forgotten listener logencodchanged * Change Jz tooltips * Reenable dynamic range and exposure * First change GUI auto ciecam * 2nd fixed ciecam auto * Improve GUI maskbackground curves * Enable activspot for la ciecam * set sensitive sliders La ciecam when auto scene conditions * Change internal calculations see comments * Checcbox ForceJz to 1 * Change tool position - change order CAM model * Expander for Jzczhz * Remove unused code * GUI changes * Change labels CAM16 Jzazbz * Change slider brightness parameters * improvment SH jz * Some changes to brightness Jz * Fixed scene conditions auto * Renable forgotten change * Prepare calculation Zcam * Prepare Iz for zcam * First GUI Zcam * Improve GUI Zcam * Calculate Qz white - brightness of the reference white * Prepare for PQ - eventually * Init LUT ZCAMBrightCurveJz and ZCAMBrightCurveQz * prepare zcam achromatic variables * First zcam * Change algo step 5 zcam * Another change original algo * Another change to original algo * first colorfullness * Fixed bad behavior threshold and change c c2 surround parameters * added saturation Zcam * Change parameters surround * Enable chroma zcam * change chroma and lightness formula * disable OMP for 2nd process Zcam * Improvment zcam for some high-light images * Change parameters overflow zcam * Change parmeters high datas * another change to retrieve... * Simplify code matrix conversion xyz-jzazbz * Adjust internam parameters zcam * Change some parameters - clean code * Enable PQCam16 * Enable PQ Cam16 - disable ZCAM * remove warning compilation message * Change GUI jzczhz * Fixed bad behavior remaping jz * Remove forgotten parameter - hide Jz100 - PU adaptation- chnage tooltips * Another change to chroma parameter * Small changes * If verbose display in console Cam16 informations * If verbose display in console source saturation colorfullness * Change to La calculation for ciecam * Change GUI cam16 - jzczhz - remove cam16 and jzczhz * Disable exposure compensation to calculate La for all Ciecam and Log encoding * Change label Cam16 and jzczhz * Improve GUI Jz * Other improvment GUI Jz Cam16 * verify nan Jz and ciecam matrix to avoid crash * Enable La manual for Jz to change PU-adaptation * Improve calculation to avoid crash Jz and Cam16 matrix * Fixed crash with local contrast in cam16 * Clean code loccont * First step GUI Cie mask * GUI part 2 - Cie * Build cieMask * Gui part 3 cie * Valid llcieMask * Valid llcieMask * Pass GUI curves parameters to iplocallab.cc * 2nd pass parameters from GUI to iplocallab.cc * Init first functions modifications * Add expander to cam16 adjustments * First test mask cie * Various improvment GUI - tooltips - process * Take into account Yb cam16 for Jz - reenable warm-cool * Surround source Cam16 before Jz * Improve GUI and process * Fixed bug and bad behavior last commit * Fixed bug chroma mask - improve GUI - Relative luminance for Jz * Increase sensitivity mask chroma * Improve Jz with saturation Z - improve GUI Jzczhz * Small code improvment * Another change mask C and enable mask for Cam16 and Jz * Some changes * Enable denoise chroma mask * Small change LIM01 normchromar * Enable Zcam matrix * Improve chroma curves...mask and boudaries * take into account recursive slider in settings * Change tooltip - improvment to C curve (denoise C - best value in curves - etc.) - remove Zcam button * Change tooltips * First part GUI - local contrast wavelet Jz * Passed parameters GUI local contrast wav jz to rtengine * save config wavelet jz * first try wavelet local contrast Jz * Add tooltips * Simplify code wavelet local contrast * take into account edge wavelet performance in Wavelet Jz * Fixed overflow jz when usig botth contradt and wavelt local jz contrast * Adapt size winGdiHandles in filepanel to avoid crash in Windows multieditor * First GUI part Clarity wavelet Jz * First try wavelet Jz Cz clarity * Added tooltips * Small change to enable wavelet jz * Disabled (commented) all Zcam code * Improve behavior when SH local-contrast and Clarity are use both * Change limit PQremap jz * Clean and optimize code * Reenable mjjz * Change settings guidedfilter wavelet Jz * Fixed crash when revory based on lum mask negative * Change tooltip * Fixed ad behavior auto mean and absolute luminance * Remove warning in console * Fixed bad behavior auto Log encoding - bad behavior curves L(H) Jz * Fixed another bad behavior - reenable curves color and light L(H) C(H) * first transposition Lab Jz for curves H * Change mask boundary for Jz * Various improvment to H curves Jz * Add amountchrom to Hcurve Color and Light * Improve gray boundary curves behavior * reenable Jz curve H(H) - soft radius * Improve guidefilter Jz H curve * Threshold chroma Jz(Hz) * Enable guidedfilter chroma curve H * improve GUI curves Hz * Checkbutton chroma for curve Jz(Hz) * Change event selectspot * Clean and small optimization code * Another clean code * Change calculation Hz references for curves Hz * Clean code * Various changes to GF and GUI * Another change to Chroma for Jz(H) * Change GUI sensitive Jz100 adapdjzcie * Improve code for Jz100 * Change default value skin-protection to 0 instead of 50 * Clean code * Remove BENCHFUN for ciecam * small correction to huejz_to_huehsv2 conversion * Added missing plum parameter for jch2xyz_ciecam02float * another small change to huejz_to_huehsv2 * Improvment to huelab_to_huehsv2 and some double functions * Fixed warning hide parameters in lgtm-com * Fixed ? Missing retuen statement in lgtm-com * Change behavior Log encoding whith PQ Cam16 * Small improvment to Jz PU adaptation * Added forgoten to_one for Cz slider * Replace 0.707... by RT_SQRT1_2 - change some settings chroma * Improvment to getAutoLogloc * Fixed crash with array in getAutoLogloc * First try Jz Log encoding * Forgotten Cz * Various improvment GUI setlogscale - Jz log encoding * Change labels tooltips Jz log * Change wrong clipcz value * Change tooltip auto scene conditions * Fixed bad behavior blackevjz whiteevjz * Small improvment to LA Log encoding std * Avoid bad behavior Jz log when enable Relative luminance * Change sourcegray jz calculation * Revert last change * Clean and comment code * Review tooltips thanks to Wayne - harmonize response La log encoding and Jz Log encoding * Always force Dynamic Range evaluation in full frame mode for Jz log encoding * Remove unused code * Small optimizations sigmoid Cam16 and Jz * Comment code * Change parameters deltaE for HDR * Various improvment to Jz - La - sigmoid - log encoding * Basic support for Sony ILCE-7M4 in camconst.json * German translation Spot Removal (#6388) * Filmnegative German translation (#6389) * (Temporarily) disable `ftree-loop-vectorize` for GCC 11 because of #6384 * Added BlacEv WhiteEv to sigmoidJz * Improve GUI for BlackEv WhiteEv * Change location SigmoidJz in Iplocallab * Improvment GUI and sensitivity sliders strength sigmoid * Change labels Co-authored-by: Thanatomanic <6567747+Thanatomanic@users.noreply.github.com> Co-authored-by: Anna <simonanna@gmx.net>
@Thanatomanic I see you've disabled the vectorization for gcc 11. Did you observe the same problem with earlier versions of gcc 11, not just 11.2? |
@Lawrence37 I suspect earlier versions of the 11 branch have never been publicly available for MSYS2. At least, they're not in the public repo: http://repo.msys2.org/mingw/x86_64/ |
MSYS2 updated to GCC 11.3.0 but the bug remains. |
@Thanatomanic I've built RT without |
I was hoping to test if GCC 12.1.0 fixes the issue, but I'm not able to reproduce the bug on GCC 11.2.0/11.3.0 even with the same commit @Thanatomanic used. AboutThisBuild.txt
|
For me it is trivial to reproduce in a fully updated MSYS2 using GCC 12.1.0 and removing |
That's interesting. I upgraded to GCC 12.1.0 and now the image is green. |
@ValZapod I just tested on Windows, but 12.2 still has the same issue. I don't have the courage to file a bug report with the GCC devs... |
Was anyone able to reproduce the vectorization bug with the code snippet provided by @heckflosse? I tried with different inputs but they all resulted in the expected output. I was hoping to use the snippet as a starting point for creating a minimum example for a bug report. |
Hello Excuse my questions, but for me Msys2, git, etc. it's Chinese, I scrupulously transcribe the instructions, but I don't understand (or misunderstand) what I'm doing For health reasons (which unfortunately are still present) I haven't updated Msys2 for a long time. I'm on « gcc 10.3 » I see there are problems with gcc versions 11 and 12 Can I update Msys2 using ? : Or should I wait ? I see in cmakelist.txt if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU" AND ((CMAKE_CXX_COMPILER_VERSION VERSION_GREATER "10.0" AND CMAKE_CXX_COMPILER_VERSION VERSION_LESS "10.2") OR (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER_EQUAL "11.0"))) In case it doesn't work (I'm on Windows 8, my machine is very old... 11 years, I had to change it at the beginning of 2022, but...). What are the instructions to be entered line by line either :
I will keep these instructions carefully for later use. Thank you for this educational information. Jacques |
@Desmis It is safe to upgrade your MSYS2 environment and upgrade GCC to the latest version. The code in So, you can just open the |
@Thanatomanic |
@Lawrence37 Not explicitly, but it is fairly hard to test anyway because the code requires // fast gaussian approximation if the support window is large
template<class T> void gaussHorizontal (T** src, T** dst, const int W, const int H, const double sigma)
{
double b1, b2, b3, B, M[3][3];
calculateYvVFactors<double>(sigma, b1, b2, b3, B, M);
for (int i = 0; i < 3; i++) {
for (int j = 0; j < 3; j++) {
M[i][j] /= (1.0 + b1 - b2 + b3) * (1.0 + b2 + (b1 - b3) * b3);
}
}
double temp2[W] ALIGNED16;
printf("src=[%g %g %g %g %g\n",src[0][0],src[0][1],src[0][2],src[0][3],src[0][4]);
printf("%g %g %g %g %g\n",src[1][0],src[1][1],src[1][2],src[1][3],src[1][4]);
printf("%g %g %g %g %g\n",src[2][0],src[2][1],src[2][2],src[2][3],src[2][4]);
printf("%g %g %g %g %g\n",src[3][0],src[3][1],src[3][2],src[3][3],src[3][4]);
printf("%g %g %g %g %g]\n",src[4][0],src[4][1],src[4][2],src[4][3],src[4][4]);
#ifdef _OPENMP
#pragma omp for
#endif
for (int i = 0; i < H; i++) {
temp2[0] = B * src[i][0] + b1 * src[i][0] + b2 * src[i][0] + b3 * src[i][0];
temp2[1] = B * src[i][1] + b1 * temp2[0] + b2 * src[i][0] + b3 * src[i][0];
temp2[2] = B * src[i][2] + b1 * temp2[1] + b2 * temp2[0] + b3 * src[i][0];
for (int j = 3; j < W; j++) {
temp2[j] = B * src[i][j] + b1 * temp2[j - 1] + b2 * temp2[j - 2] + b3 * temp2[j - 3];
}
double temp2Wm1 = src[i][W - 1] + M[0][0] * (temp2[W - 1] - src[i][W - 1]) + M[0][1] * (temp2[W - 2] - src[i][W - 1]) + M[0][2] * (temp2[W - 3] - src[i][W - 1]);
double temp2W = src[i][W - 1] + M[1][0] * (temp2[W - 1] - src[i][W - 1]) + M[1][1] * (temp2[W - 2] - src[i][W - 1]) + M[1][2] * (temp2[W - 3] - src[i][W - 1]);
double temp2Wp1 = src[i][W - 1] + M[2][0] * (temp2[W - 1] - src[i][W - 1]) + M[2][1] * (temp2[W - 2] - src[i][W - 1]) + M[2][2] * (temp2[W - 3] - src[i][W - 1]);
temp2[W - 1] = temp2Wm1;
temp2[W - 2] = B * temp2[W - 2] + b1 * temp2[W - 1] + b2 * temp2W + b3 * temp2Wp1;
temp2[W - 3] = B * temp2[W - 3] + b1 * temp2[W - 2] + b2 * temp2[W - 1] + b3 * temp2W;
for (int j = W - 4; j >= 0; j--) {
temp2[j] = B * temp2[j] + b1 * temp2[j + 1] + b2 * temp2[j + 2] + b3 * temp2[j + 3];
}
for (int j = 0; j < W; j++) {
dst[i][j] = (T)temp2[j];
}
}
printf("dst=[%g %g %g %g %g\n",dst[0][0],dst[0][1],dst[0][2],dst[0][3],dst[0][4]);
printf("%g %g %g %g %g\n",dst[1][0],dst[1][1],dst[1][2],dst[1][3],dst[1][4]);
printf("%g %g %g %g %g\n",dst[2][0],dst[2][1],dst[2][2],dst[2][3],dst[2][4]);
printf("%g %g %g %g %g\n",dst[3][0],dst[3][1],dst[3][2],dst[3][3],dst[3][4]);
printf("%g %g %g %g %g]\n",dst[4][0],dst[4][1],dst[4][2],dst[4][3],dst[4][4]);
} So, simply check the input and output of both arrays. I then configure and compile RawTherapee as follows:
Doing the same, but with
|
I run with Msys2 But when I run jacques@pc-bureau MINGW64 /g/code/repo-rt/build jacques@pc-bureau MINGW64 /g/code/repo-rt/build Can you help me ? Jacques |
@Desmis Was your |
No I will try with another build Jacques |
@Thanatomanic The system compile... now I will verify after, if it run well. Thank you Jacques |
Some additional information on the bug: it seems there is a explosive loss of precision and propagation of errors while performing this loop in
Under
But when the bug appears, the numbers only are the same up to the ~40th index and it goes from bad to worse quickly:
|
My system hangs here... [266/300] Building CXX object rtgui/CMakeFiles/rth.dir/splash.cc.obj Jacques |
@Thanatomanic My computer is very very old, and it is very hot... I think all works fine Thank you again Jacques |
It does not hang, just give it time. This is an unfortunate downside of the new GCC, see here: #6548 |
OK thank you Jacques |
@ZaquL It is not. There is a proposed solution, but it has not been fixed in the main branch. |
That was helpful @Thanatomanic. It turns out the This is the simplest program I can make which shows the problem. Any other reduction yields correct results. I wish it could be reduced further, but if it's not possible, then I guess I'll use this example to open a bug report with the GCC folks. /*
* This file is part of RawTherapee.
*
* Copyright (c) 2004-2010 Gabor Horvath <hgabor@rawtherapee.com>
*
* RawTherapee is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* RawTherapee is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with RawTherapee. If not, see <https://www.gnu.org/licenses/>.
*/
#include <algorithm>
#include <cmath>
#include <iostream>
#include <vector>
template<class T> void calculateYvVFactors( const T sigma, T &b1, T &b2, T &b3, T &B, T M[3][3])
{
// coefficient calculation
T q;
if (sigma < 2.5) {
q = 3.97156 - 4.14554 * sqrt (1.0 - 0.26891 * sigma);
} else {
q = 0.98711 * sigma - 0.96330;
}
T b0 = 1.57825 + 2.44413 * q + 1.4281 * q * q + 0.422205 * q * q * q;
b1 = 2.44413 * q + 2.85619 * q * q + 1.26661 * q * q * q;
b2 = -1.4281 * q * q - 1.26661 * q * q * q;
b3 = 0.422205 * q * q * q;
B = 1.0 - (b1 + b2 + b3) / b0;
b1 /= b0;
b2 /= b0;
b3 /= b0;
for (int i = 0; i < 9; i++) {
M[i/3][i%3] = 0;
}
}
// fast gaussian approximation if the support window is large
template<class T> void gaussHorizontal (T* src, T* dst, const int W, const double sigma)
{
double b1, b2, b3, B, M[3][3];
calculateYvVFactors<double>(sigma, b1, b2, b3, B, M);
for (int i = 0; i < 3; i++)
for (int j = 0; j < 3; j++) {
M[i][j] /= (1.0 + b1 - b2 + b3) * (1.0 + b2 + (b1 - b3) * b3);
}
double temp2[W];
std::fill(temp2, temp2 + W, 0);
for (int j = 3; j < W; j++) {
// FIXME: Bug is here!
temp2[j] = B * src[j] + b1 * temp2[j - 1] + b2 * temp2[j - 2] + b3 * temp2[j - 3];
}
for (int j = W - 4; j >= 0; j--) {
// FIXME: and/or here!
temp2[j] = B * temp2[j] + b1 * temp2[j + 1] + b2 * temp2[j + 2] + b3 * temp2[j + 3];
}
for (int j = 0; j < W; j++) {
dst[j] = (T)temp2[j];
}
}
template<class T> void gaussianBlurImpl(T* src, T* dst, const int W, const double sigma)
{
gaussHorizontal<T> (src, dst, W, sigma);
double b1, b2, b3, B, M[3][3];
calculateYvVFactors<double>(sigma, b1, b2, b3, B, M);
}
void gaussianBlur(float* src, float* dst, const int W, const double sigma)
{
gaussianBlurImpl<float>(src, dst, W, sigma);
}
int main() {
constexpr int w = 100;
std::vector<float> src_data(w, 1);
std::vector<float> dst_data(w);
gaussianBlur(src_data.data(), dst_data.data(), w, 30.0);
std::cout << dst_data[w/2] << std::endl;
return 0;
} Edit: It does appear to be a loss of precision problem. Increasing the size of the "image" means more values need to be propagated which results in worse numerical results. |
@Lawrence37 Thanks for the code. I can confirm that this returns correct values with both However, now that we seem to have traced the origin, I would be surprised that this is actually the only place where such a loss of precision happens. I have the impression that code like this is all over the place... 🤔 |
Maybe the reason why we are not seeing the problem elsewhere is because these three conditions must be satisfied:
If we are confident that FMA is the problem, we should change the |
Made a bug report here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902 |
@ValZapod I checked with a less trimmed-down version of the code and the bug remains. Hopefully the GCC people are able to find the cause using the sample I provided. |
It is not fixed. If they need the bug to be reproducible in |
At least things are being looked at seriously, and there is even a recommendation on what to do to avoid issue. |
@ValZapod maybe I'm misunderstanding, but is seems clear to me that they believe it probably isn't fixed yet. See this comment regarding the bisected commits.
|
Perhaps we are talking about different bugs. The bug in the GCC-compiled example program is fixed. The bug I'm referring to is the one in GCC itself, which is not fixed. |
That's why I wrote "compiled", i.e. the binary executable. I have not touched my computer for several days. That's one reason why I didn't upload the code. Besides, the GCC people have not asked for a sample that shows the bug in |
@ValZapod The fact that the problem for the presented file is solved in trunk, seems to me to be unrelated to the bug report. That is also what Richard Biener thinks. He claims the referenced commits are unrelated to the actual problem. So, while the issue for this file seems magically fixed, it was unlikely due to Lawrence's report. Lawrence claims the bug is still there for an expanded testcase, which we will probably need to submit to the GCC people. |
Back on my computer :)
You have a newer version of GCC than the one on godbolt.org? Here's the example code with the /*
* This file is part of RawTherapee.
*
* Copyright (c) 2004-2010 Gabor Horvath <hgabor@rawtherapee.com>
*
* RawTherapee is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* RawTherapee is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with RawTherapee. If not, see <https://www.gnu.org/licenses/>.
*/
#include <algorithm>
#include <cmath>
#include <iostream>
#include <vector>
template<class T> void calculateYvVFactors( const T sigma, T &b1, T &b2, T &b3, T &B, T M[3][3])
{
// coefficient calculation
T q;
if (sigma < 2.5) {
q = 3.97156 - 4.14554 * sqrt (1.0 - 0.26891 * sigma);
} else {
q = 0.98711 * sigma - 0.96330;
}
T b0 = 1.57825 + 2.44413 * q + 1.4281 * q * q + 0.422205 * q * q * q;
b1 = 2.44413 * q + 2.85619 * q * q + 1.26661 * q * q * q;
b2 = -1.4281 * q * q - 1.26661 * q * q * q;
b3 = 0.422205 * q * q * q;
B = 1.0 - (b1 + b2 + b3) / b0;
b1 /= b0;
b2 /= b0;
b3 /= b0;
for (int i = 0; i < 9; i++) {
M[i/3][i%3] = 0;
}
}
// fast gaussian approximation if the support window is large
template<class T> void gaussHorizontal (T* src, T* dst, const int W, const int H, const double sigma)
{
double b1, b2, b3, B, M[3][3];
calculateYvVFactors<double>(sigma, b1, b2, b3, B, M);
for (int i = 0; i < 3; i++)
for (int j = 0; j < 3; j++) {
M[i][j] /= (1.0 + b1 - b2 + b3) * (1.0 + b2 + (b1 - b3) * b3);
}
double temp2[W];
std::fill(temp2, temp2 + W, 0);
for (int j = 3; j < W; j++) {
// FIXME: Bug is here!
temp2[j] = B * src[j] + b1 * temp2[j - 1] + b2 * temp2[j - 2] + b3 * temp2[j - 3];
}
for (int j = W - 4; j >= 0; j--) {
// FIXME: and/or here!
temp2[j] = B * temp2[j] + b1 * temp2[j + 1] + b2 * temp2[j + 2] + b3 * temp2[j + 3];
}
for (int j = 0; j < W; j++) {
dst[j] = (T)temp2[j];
}
}
template<class T> void gaussianBlurImpl(T* src, T* dst, const int W, const double sigma)
{
gaussHorizontal<T> (src, dst, W, 1, sigma);
double b1, b2, b3, B, M[3][3];
calculateYvVFactors<double>(sigma, b1, b2, b3, B, M);
}
void gaussianBlur(float* src, float* dst, const int W, const double sigma)
{
gaussianBlurImpl<float>(src, dst, W, sigma);
}
int main() {
constexpr int w = 100;
std::vector<float> src_data(w, 1);
std::vector<float> dst_data(w);
gaussianBlur(src_data.data(), dst_data.data(), w, 30.0);
std::cout << dst_data[w/2] << std::endl;
return 0;
} |
Replace the previous workaround of setting -fno-tree-loop-vectorize for a GCC optimization bug. (Beep6581#6384)
Replace the previous workaround of setting -fno-tree-loop-vectorize for a GCC optimization bug. (#6384)
From the GCC man page:
On and off are currently the same, but off is more correct for disabling the problematic FMA. |
I updated my MSYS2 installation today and lo and behold, the
-fno-tree-loop-vectorize
compiler bug is back. This time with a green color cast instead of magenta.To reproduce
Fully update MSYS2, compile RT
dev
branch, open any Bayer file, apply Neutral, toggle 'Auto-correction' for RAW Chromatic Aberration Correction. Voilá, green pixels.AboutThisBuild.txt
Temporary solution
Building with
-DCMAKE_CXX_FLAGS='-fno-tree-loop-vectorize'
removes the issue.Can anyone confirm first before we may need to head back to the people upstream?
@heckflosse
The text was updated successfully, but these errors were encountered: