Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SearchDatabase Reupload für Codereview #140

Open
wants to merge 19 commits into
base: develop
Choose a base branch
from

Conversation

Maxalcer
Copy link

@Maxalcer Maxalcer commented May 4, 2023

Description

Checklist

  • Make sure that you are listed in the AUTHORS file
  • Add relevant changes and new features to the CHANGELOG file
  • I have commented my code, particularly in hard-to-understand areas
  • New and existing unit tests pass locally with my changes
  • Updated or added python bindings for changed or new classes (Tick if no updates were necessary.)

How can I get additional information on failed tests during CI

Click to expand If your PR is failing you can check out
  • The details of the action statuses at the end of the PR or the "Checks" tab.
  • http://cdash.openms.de/index.php?project=OpenMS and look for your PR. Use the "Show filters" capability on the top right to search for your PR number.
    If you click in the column that lists the failed tests you will get detailed error messages.

Advanced commands (admins / reviewer only)

Click to expand
  • /reformat (experimental) applies the clang-format style changes as additional commit. Note: your branch must have a different name (e.g., yourrepo:feature/XYZ) than the receiving branch (e.g., OpenMS:develop). Otherwise, reformat fails to push.
  • setting the label "NoJenkins" will skip tests for this PR on jenkins (saves resources e.g., on edits that do not affect tests)
  • commenting with rebuild jenkins will retrigger Jenkins-based CI builds

⚠️ Note: Once you opened a PR try to minimize the number of pushes to it as every push will trigger CI (automated builds and test) and is rather heavy on our infrastructure (e.g., if several pushes per day are performed).

* @param experiment Input MSExperiment containing of MS2 Spectra
* @param candidates Output vector of found candidates with the index of MSSpectrum in experiment
*/
void search(MSExperiment& experiment, std::vector<std::pair<std::vector<Candidate>, size_t>>& candidates) const;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very usefull to make the functions const, makes it easier to parall. for other people :)

std::vector<SearchDatabase::Peptide_> generate_peptides_(const std::vector<FASTAFile::FASTAEntry>& entries) const;
/// Merges presorted Chunks of Peptide-Fragments inplace
void fragment_merge_(int first, int last, const std::vector<int>& chunks, std::vector<SearchDatabase::Fragment_>& input) const;
///generates sortet vector with all theoretical Fragments for all theoretical Peptides

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very small addition: sortet -> sorted

return all_peptides;
}

void SearchDatabase::fragment_merge_(int first, int last, const std::vector<int>& chunks, std::vector<SearchDatabase::Fragment_>& input) const

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const _ & for first and last possible?

if (last - first > 1)
{
int mid = first + (last-first) / 2;
#pragma omp parallel sections

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea for optimisation

std::vector<Fragment_> all_frags;
std::vector<int> chunk_start = {0}; //for fragment_merge need start and end of presorted chunks

for (size_t i = 0; i < all_peptides_.size(); i++)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Para possible? emplace_back() could be changed to [] operator for no conflict

#include <OpenMS/KERNEL/Peak1D.h>
#include <OpenMS/CHEMISTRY/ProteaseDigestion.h>
#include <OpenMS/CHEMISTRY/TheoreticalSpectrumGenerator.h>
#include <vector>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coding convention (alph. order)

std::vector<SearchDatabase::Peptide_> generate_peptides_(const std::vector<FASTAFile::FASTAEntry>& entries) const;
/// Merges presorted Chunks of Peptide-Fragments inplace
void fragment_merge_(int first, int last, const std::vector<int>& chunks, std::vector<SearchDatabase::Fragment_>& input) const;
///generates sortet vector with all theoretical Fragments for all theoretical Peptides
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
///generates sortet vector with all theoretical Fragments for all theoretical Peptides
///generates sorted vector with all theoretical Fragments for all theoretical Peptides

#include <OpenMS/CHEMISTRY/AASequence.h>
#include <OpenMS/DATASTRUCTURES/DefaultParamHandler.h>
#include <OpenMS/FORMAT/FASTAFile.h>
#include <OpenMS/KERNEL/MSExperiment.h>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <OpenMS/KERNEL/MSExperiment.h>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesnt work with forwarding

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you include the header in your cpp?
(because MSExperiment must be known when compiling SearchDatabase.cpp)

Comment on lines 46 to 48
namespace OpenMS
{
/** @brief Creating a two level tree from a given FASTAFile by generating the theoretical peptides and their b-y-ions and sorting
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
namespace OpenMS
{
/** @brief Creating a two level tree from a given FASTAFile by generating the theoretical peptides and their b-y-ions and sorting
namespace OpenMS
{
class MSExperiment;
/** @brief Creating a two level tree from a given FASTAFile by generating the theoretical peptides and their b-y-ions and sorting

namespace OpenMS
{
/** @brief Creating a two level tree from a given FASTAFile by generating the theoretical peptides and their b-y-ions and sorting
* them in the outer tree by fragment-MZ and the inner tree by precursor-MZ. Search function to find candidates to experimental MS2 sepctra
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add link to blog of sagems

Comment on lines 78 to 81

* @param spectrum Input MS2 Spectrum
* @param candidates Output vector of found candidatest
*/
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens for an MS1 spectrum

AASequence sequence_;
size_t protein_index_;
double peptide_mz_;
Peptide_() = default;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if needed;\iff needed then intialize the members

size_t protein_index_;
double peptide_mz_;
Peptide_() = default;
Peptide_(const Peptide_&) = default;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I claim: move ctor is deleted but should exist

std::vector<SearchDatabase::Fragment_> generate_fragments_() const;

std::string digestor_enzyme_;
size_t missed_cleavages_;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
size_t missed_cleavages_;
size_t missed_cleavages_; ///< number of missed cleavages

size_t count_found = 0;
size_t count_rounds = 0;

auto prec_unit_cond = precursor_mz_tolerance_unit_ == "Da";
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto prec_unit_cond = precursor_mz_tolerance_unit_ == "Da";
auto prec_unit_cond = precursor_mz_tolerance_unit_ == UNIT_DA;


return all_frags;
}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
char* UNIT_DA = "Da";

ProteaseDB::getInstance()->getAllNames(all_enzymes);
vector<String> all_mods;
ModificationsDB::getInstance()->getAllSearchModifications(all_mods);
vector<string> tolerance_units{"Da", "ppm"};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vector<string> tolerance_units{"Da", "ppm"};
vector<string> tolerance_units{UNIT_DA, "ppm"};

defaults_.setValue("peptide_max_length", 50, "maximal peptide length for database");
defaults_.setValue("fragment_min_mz", 150, "minimal fragment mz for database");
defaults_.setValue("fragment_max_mz", 2000, "maximal fragment mz for database");
defaults_.setValue("precursor_mz_tolerance", 2, "tolerance for precursor-MZ in search");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defaults_.setValue("precursor_mz_tolerance", 2, "tolerance for precursor-MZ in search");
defaults_.setValue("precursor_mz_tolerance", 2, "tolerance for precursor-m/z in search");

Comment on lines 198 to 201
#pragma omp parallel
{
// sorting the fragments of each bucket by the precursor-MZ
#pragma omp for
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#pragma omp parallel
{
// sorting the fragments of each bucket by the precursor-MZ
#pragma omp for
// sorting the fragments of each bucket by the precursor-MZ
#pragma omp parallel for
{

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this { in line 200 needed. Cause our code only works without it?!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, please re-add it

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or delete the corresponding } after the for loop (probably even better this way)

Maxalcer and others added 14 commits May 5, 2023 14:36
…\n added test to check if Datasructure is sorted \n added UNIT_DA and UNIT_PPM to Constants.h
Co-authored-by: Timo Sachsenberg <timo.sachsenberg@uni-tuebingen.de>
Co-authored-by: Timo Sachsenberg <timo.sachsenberg@uni-tuebingen.de>
Co-authored-by: Timo Sachsenberg <timo.sachsenberg@uni-tuebingen.de>
Co-authored-by: Timo Sachsenberg <timo.sachsenberg@uni-tuebingen.de>
Co-authored-by: Timo Sachsenberg <timo.sachsenberg@uni-tuebingen.de>
Co-authored-by: Julianus Pfeuffer <pfeuffer@informatik.uni-tuebingen.de>
Co-authored-by: Julianus Pfeuffer <pfeuffer@informatik.uni-tuebingen.de>
Co-authored-by: Julianus Pfeuffer <pfeuffer@informatik.uni-tuebingen.de>
Co-authored-by: Julianus Pfeuffer <pfeuffer@informatik.uni-tuebingen.de>
Co-authored-by: Julianus Pfeuffer <pfeuffer@informatik.uni-tuebingen.de>
…ead of AASequence, changed name to FragmentIndex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants