Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization in OpenSwathWorkflow #92

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from
Open

Conversation

xchen98
Copy link

@xchen98 xchen98 commented May 11, 2020

No description provided.

@xchen98
Copy link
Author

xchen98 commented May 12, 2020

1.Part: SignalToNoiseEstimator
It coud be found that the type pf data, map<PeakType, doule>, was used in SignalToNoiseEstimator of Tool OpenSwathWorkflow,in which the values corresponded to the iteration of map are storaged. The stored values are synchronously updated contiously with the iteration, using the funtion named getSignalToNoise. However, vector could be more efficient when processing large amounts of data. Thus,vectors are applied here.From the diagram shown below, it could be easily found that the processing speed differs a lot while using vector and map.(left: map | right: vector)
runtime of OpenSwathWorkflow with map: 18:23.228 s
runtime of OpenSwathWorkflow with vector: 15:17.435s
Signal_Vergleiche

@@ -39,6 +39,7 @@
#include <OpenMS/CONCEPT/Exception.h>
#include <OpenMS/DATASTRUCTURES/ListUtils.h>
#include <vector>
#include <algorithm>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in other files the specific reason for a (non-obvious) include was commented. E.g. #include <algorithm> // for std::max_element

Copy link
Author

@xchen98 xchen98 May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll edit it later ,thanks for your commend :)

// ++windows_overall;
// ++run;
// }
int windows_overall = c.size();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int windows_overall is only used once here - neccessary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be used directly in the function . If we used it, it can makes code more clear and understandable.

@@ -370,7 +376,7 @@ namespace OpenMS
}

// store result
stn_estimates_[*window_pos_center] = (*window_pos_center).getIntensity() / noise;
stn_estimates_[window_count] = ((*window_pos_center).getIntensity() / noise);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has this line been changed? I don't see a change in functionality.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type of stn_estimates_ is map, map<PeakType, double>, so I used the iteration(*window_pos_center) to assign it.
But the type of stn_estimates now is vector, so I used a index(int window_count) instead.

virtual double getSignalToNoise(const PeakType & data_point)
{
if (!is_result_valid_)
virtual double getSignalToNoise(const Size index)
Copy link

@taranehstrunk taranehstrunk May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call const Size index something that is related to data_point. Index doesn't really state anything about the variable's functionality.
Also it looks like there's an additional tab before virtual double...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see the last answer of mine.

@@ -106,46 +109,58 @@ namespace OpenMS

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we stick to the rule of six, don't we have to add move constructor and move assignment operator?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already added them (c with type Container).

@@ -275,7 +282,6 @@ namespace OpenMS
std::cerr << "TODO SignalToNoiseEstimatorMedian: the max_intensity_ value should be positive! " << max_intensity_ << std::endl;
return;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted empty line

@@ -1222,7 +1222,7 @@ namespace OpenMS
&& (shape.getFWHM() >= fwhm_bound_)
&& (shape.getFWHM() <= fwhm_upper_bound))
{
shape.signal_to_noise = sne.getSignalToNoise(area.max);
shape.signal_to_noise = sne.getSignalToNoise((Size)distance(raw_peak_array.begin(),area.max));
peak_shapes.push_back(shape);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use emplace_back instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible.

@@ -1179,7 +1179,7 @@ namespace OpenMS
{
// if the signal to noise ratio at the max position is too small
// the peak isn't considered
if ((area.max != it_pick_end) && (sne.getSignalToNoise(area.max) < signal_to_noise_))
if ((area.max != it_pick_end) && (sne.getSignalToNoise((Size)distance(raw_peak_array.begin(),area.max)) < signal_to_noise_))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use it_pick_begin ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I can, but I ignored it before.

END_SECTION

START_SECTION((virtual double getSignalToNoise(const PeakType &data_point)))
START_SECTION((virtual double getSignalToNoise(const Size index)))
// hard to do without implementing computeSTN_ properly

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "properly" mean? Maybe it can be tested now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// hard to do without implementing computeSTN_ properly
// hard to do without implementing computeSTN_ properly

It is just a comment. Nothing between START_SECTION and END_SECTION can be tested.

Copy link

@taranehstrunk taranehstrunk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I would suggest to delete all the lines that have now been replaced instead of just commenting them out.

@xchen98 xchen98 reopened this May 20, 2020
@xchen98 xchen98 changed the title Optimization in SignalToNoiseEstimator of OpenSwathWorkflow Optimization in OpenSwathWorkflow May 20, 2020

namespace OpenMS
{
/**
* @brief Pre-calculate isotope distributions for interesting mass ranges
*/
class OPENMS_DLLAPI IsotopeDistributionCache
class OPENMS_DLLAPI IsotopeDistributionCache:
public FeatureFinderAlgorithmPickedHelperStructs
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why inherit from FeatureFinderAlgorithmPickedHelperStructs????

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I forget to delete it.


private:
/// Vector of pre-calculated isotope distributions for several mass windows
std::vector<TheoreticalIsotopePattern> isotope_distributions_;

std::vector<IsotopeDistribution> intensity_;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intensity_ is not a good name. Its a distribution...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll call the member distribution_cache_ instead.

@@ -394,37 +400,13 @@ namespace OpenMS
// create the theoretical distribution from the sum formula
EmpiricalFormula empf(sum_formula);
isotope_dist = empf.getIsotopeDistribution(CoarseIsotopePatternGenerator(dia_nr_isotopes_));
iso_.renormalize(isotopes, isotope_dist);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, the CoarseIsotopePatternGenerator is still called.
Are you sure this branch is not taken?
In any case, replace it with Cache...

{
Size num_isotopes = std::ceil(max_mass / mass_window_width) + 1;
mass_window_width_ = 20.0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move initialization to initializer list (as done for mass_window_width_(mass_window_width) before)


}

void IsotopeDistributionCache::renormalize(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can isotope_dist be a const param? You are only reading from it??

}


void IsotopeDistributionCache::calculateisotopeDistribution(OpenMS::Size num_begin)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calculateisotopeDistribution is a bit unspecific.
You are effectively extending the cache?

@@ -215,6 +216,8 @@ namespace OpenMS
bool dia_extraction_ppm_;
bool dia_centroided_;

IsotopeDistributionCache iso_;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you call this iso_cache_ or something which reflects its purpose?

if (index >= isotope_distributions_.size())
{
throw Exception::InvalidValue(__FILE__, __LINE__, OPENMS_PRETTY_FUNCTION, "IsotopeDistribution not precalculated. Maximum allowed index is " + String(isotope_distributions_.size()), String(index));
Size size = isotope_distributions_.size();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code should move into calculateisotopeDistribution.
simply pass the new maximum index which you need to have filled.
The rest should happen in the function.
Avoids code duplication (see b2b8728#diff-8d025ab465e7de035c1eae4c0d2440b9R145)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants