Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seq align new #114

Open
wants to merge 61 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
b93b264
added myself as author
Mar 31, 2021
b12f802
added myself as author
Mar 31, 2021
257a6b9
FASTA-Reader-Files without using Seqan
Apr 17, 2021
74e4708
Merge branch 'develop' of https://github.com/OpenMS/OpenMS into newbr…
Apr 19, 2021
e8a3837
Deleted the file from the git repository
Apr 19, 2021
c5d8951
Deleted the file from the git repository
Apr 19, 2021
5ceedcb
Deleted the file from the git repository
Apr 19, 2021
30c420f
Deleted the file from the git repository
Apr 19, 2021
87d5034
Fasta-Reader without using Seqan
Apr 19, 2021
42c320e
Fasta-Reader without using Seqan
Apr 19, 2021
e204e87
Fasta-Reader without using Seqan
Apr 20, 2021
77e168d
Fasta-Reader without using Seqan
Apr 23, 2021
9739aac
Fasta-Reader without using Seqan
Apr 23, 2021
b8d09dd
test
Apr 23, 2021
ab28d8d
merge and Fasta Reader
Apr 27, 2021
2685970
atEnd()
Apr 27, 2021
c9d1d50
test setPosition()
Apr 27, 2021
bcf9b78
test.tmp
Apr 27, 2021
127e660
Fasta Reader
Apr 28, 2021
2288487
atEnd()
Apr 28, 2021
dcfc270
merge-conflict
Apr 28, 2021
83d8429
Merge branch 'develop' of https://github.com/OpenMS/OpenMS into fasta…
Apr 28, 2021
40626c0
ProgressLogger
Apr 29, 2021
a633f58
\r
Apr 29, 2021
b545d88
eof
Apr 29, 2021
662920e
test position
Apr 30, 2021
5eb9ab5
delete_tmp
Apr 30, 2021
c59d491
fasta-reader
May 3, 2021
52062e9
merge-conflict
May 3, 2021
a7b2cf2
Merge branch 'fasta-reader' of https://github.com/noraw61/OpenMS into…
May 3, 2021
7360aa5
testPos()
May 3, 2021
25e8570
old-test-version
May 3, 2021
a62c4f6
Merge branch 'testbranch' of https://github.com/kasrat93/OpenMS into …
May 3, 2021
fa81f81
fasta-reader
May 3, 2021
ad32922
fasta-reader
May 3, 2021
fcc4d04
fasta-reader
May 3, 2021
82e3769
style changes
May 4, 2021
c9893a9
style changes
May 5, 2021
ca23e07
seq-align not yet compiling
May 11, 2021
5502882
seq-align not yet compiling
May 11, 2021
213b6a6
seq-align working but not yet optimized
May 13, 2021
5b15e72
seq-align working but not yet optimized
May 14, 2021
fa95d5f
NeedlemanWunsch class test
May 17, 2021
0547ce9
Exceptions
May 17, 2021
e200a99
neue Matrizen
May 17, 2021
c4d402a
udpate param_ will follow
May 18, 2021
f433bc9
Vector
May 18, 2021
eb9bb99
include Test & edit Vector
May 18, 2021
b8c3716
getIndexNEW_ for Matrix[i][j]
May 18, 2021
edb665e
add break
May 18, 2021
7fc9be6
positive penalty and updatemembers_()
May 19, 2021
ec37a2c
Merge branch 'seq-align' of https://github.com/noraw61/OpenMS into Se…
May 19, 2021
c09a052
Replace getIndex() && insert final Matrices[26][26]
May 19, 2021
5ab8334
executables.cmake
May 20, 2021
e332e0a
Merge branch 'SeqAlignNew' of https://github.com/kasrat93/OpenMS into…
May 20, 2021
3084b22
seqalign
May 20, 2021
5f55b0b
set matrix
May 25, 2021
c8e6d5f
committing before changing branches
May 25, 2021
e96b2c7
Einrückung und Doku folgt noch
May 26, 2021
da20bc3
getMatrix()
May 26, 2021
42733fc
Einrückung
May 26, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions AUTHORS
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

=========================================================================
OpenMS -- Open Source Mass Spectrometry
=========================================================================
Expand Down Expand Up @@ -86,10 +87,14 @@ the authors tag in the respective file header.
- Swenja Wagner
- Taraneh Strunk
- Timo Sachsenberg
- Tinatin Kasradze
- Tom Lukas Lankenau
- Tom Waschischeck
- Uwe Schmitt
- Vipul Patel
- Volker Mosthaf
- Witold Wolski
- Xiao Liang



125 changes: 5 additions & 120 deletions src/openms/include/OpenMS/ANALYSIS/ID/ConsensusIDAlgorithmPEPMatrix.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,113 +35,7 @@
#pragma once

#include <OpenMS/ANALYSIS/ID/ConsensusIDAlgorithmSimilarity.h>
#include <OpenMS/DATASTRUCTURES/SeqanIncludeWrapper.h>

// Extend SeqAn by a user-define scoring matrix.
namespace seqan
{

// We have to create a new specialization of the _ScoringMatrix class
// for amino acids. For this, we first create a new tag.
struct PAM30MS {}; // PAM30MS matrix
struct AdaptedIdentity {}; // identity matrix adapted for I/L, Q/K ambiguity

// Then, we specialize the class _ScoringMatrix.
template <>
struct ScoringMatrixData_<int, AminoAcid, PAM30MS>
{
enum
{
VALUE_SIZE = ValueSize<AminoAcid>::VALUE,
TAB_SIZE = VALUE_SIZE * VALUE_SIZE
};
static inline const int* getData()
{
// Rant: I cannot find a primary source for the PAM30MS scoring matrix!
// It seems to have been first published in Huang et al., JBC 2001
// (http://www.jbc.org/content/276/30/28327), but the paper does not show
// the actual matrix (gah!).
// The matrix here comes from old OpenMS code and also matches this one:
// http://proteomics.fiocruz.br/supplementaryfiles/pepexplorer/BeforeRevision/PFUGridResults/PFUGridSearch/pam30ms.txt

static const int _data[TAB_SIZE] =
{
// A R N D C Q E G H I L K M F P S T W Y V B Z X *
/* A */ 6, -7, -4, -3, -6, -4, -2, -2, -7, -5, -6, -7, -5, -8, -2, 0, -1,-13, -8, -2, -7, -6, 0,-17,
/* R */ -7, 8, -6,-10, -8, -2, -9, -9, -2, -5, -7, 0, -4, -9, -4, -3, -6, -2,-10, -8, 5, -1, 0,-17,
/* N */ -4, -6, 8, 2,-11, -3, -2, -3, 0, -5, -6, -1, -9, -9, -6, 0, -2, -8, -4, -8, -4, -2, 0,-17,
/* D */ -3,-10, 2, 8,-14, -2, 2, -3, -4, -7,-10, -4,-11,-15, -8, -4, -5,-15,-11, -8, -7, -3, 0,-17,
/* C */ -6, -8,-11,-14, 10,-14,-14, -9, -7, -6,-11,-14,-13,-13, -8, -3, -8,-15, -4, -6,-11,-14, 0,-17,
/* Q */ -4, -2, -3, -2,-14, 8, 1, -7, 1, -8, -7, -3, -4,-13, -3, -5, -5,-13,-12, -7, -3, 4, 0,-17,
/* E */ -2, -9, -2, 2,-14, 1, 8, -4, -5, -5, -7, -4, -7,-14, -5, -4, -6,-17, -8, -6, -7, -2, 0,-17,
/* G */ -2, -9, -3, -3, -9, -7, -4, 6, -9,-11,-11, -7, -8, -9, -6, -2, -6,-15,-14, -5, -8, -7, 0,-17,
/* H */ -7, -2, 0, -4, -7, 1, -5, -9, 9, -9, -8, -6,-10, -6, -4, -6, -7, -7, -3, -6, -4, -3, 0,-17,
/* I */ -5, -5, -5, -7, -6, -8, -5,-11, -9, 8, 5, -6, -1, -2, -8, -7, -2,-14, -6, 2, -6, -7, 0,-17,
/* L */ -6, -7, -6,-10,-11, -7, -7,-11, -8, 5, 5, -7, 0, -3, -8, -8, -5,-10, -7, 0, -7, -7, 0,-17,
/* K */ -7, 0, -1, -4,-14, -3, -4, -7, -6, -6, -7, 7, -2,-14, -6, -4, -3,-12, -9, -9, 5, 4, 0,-17,
/* M */ -5, -4, -9,-11,-13, -4, -7, -8,-10, -1, 0, -2, 11, -4, -8, -5, -4,-13,-11, -1, -3, -3, 0,-17,
/* F */ -8, -9, -9,-15,-13,-13,-14, -9, -6, -2, -3,-14, -4, 9,-10, -6, -9, -4, 2, -8,-12,-14, 0,-17,
/* P */ -2, -4, -6, -8, -8, -3, -5, -6, -4, -8, -8, -6, -8,-10, 8, -2, -4,-14,-13, -6, -5, -5, 0,-17,
/* S */ 0, -3, 0, -4, -3, -5, -4, -2, -6, -7, -8, -4, -5, -6, -2, 6, 0, -5, -7, -6, -4, -5, 0,-17,
/* T */ -1, -6, -2, -5, -8, -5, -6, -6, -7, -2, -5, -3, -4, -9, -4, 0, 7,-13, -6, -3, -5, -4, 0,-17,
/* W */ -13, -2, -8,-15,-15,-13,-17,-15, -7,-14,-10,-12,-13, -4,-14, -5,-13, 13, -5,-15, -7,-13, 0,-17,
/* Y */ -8,-10, -4,-11, -4,-12, -8,-14, -3, -6, -7, -9,-11, 2,-13, -7, -6, -5, 10, -7,-10,-11, 0,-17,
/* V */ -2, -8, -8, -8, -6, -7, -6, -5, -6, 2, 0, -9, -1, -8, -6, -6, -3,-15, -7, 7, -9, -8, 0,-17,
/* B */ -7, 5, -4, -7,-11, -3, -7, -8, -4, -6, -7, 5, -3,-12, -5, -4, -5, -7,-10, -9, 5, 1, 0,-17,
/* Z */ -6, -1, -2, -3,-14, 4, -2, -7, -3, -7, -7, 4, -3,-14, -5, -5, -4,-13,-11, -8, 1, 4, 0,-17,
/* X */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,-17,
/* * */ -17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17, 1
};

return _data;
}
};

template <>
struct ScoringMatrixData_<int, AminoAcid, AdaptedIdentity>
{
enum
{
VALUE_SIZE = ValueSize<AminoAcid>::VALUE,
TAB_SIZE = VALUE_SIZE * VALUE_SIZE
};
static inline const int* getData()
{
static const int _data[TAB_SIZE] =
{
// A R N D C Q E G H I L K M F P S T W Y V B Z X *
/* A */ 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -17,
/* R */ 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -17,
/* N */ 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -17,
/* D */ 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -17,
/* C */ 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -17,
/* Q */ 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -17,
/* E */ 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -17,
/* G */ 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -17,
/* H */ 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -17,
/* I */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -17,
/* L */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -17,
/* K */ 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -17,
/* M */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -17,
/* F */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, -17,
/* P */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, -17,
/* S */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, -17,
/* T */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, -17,
/* W */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, -17,
/* Y */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, -17,
/* V */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, -17,
/* B */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, -17,
/* Z */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, -17,
/* X */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -17,
/* * */ -17, -17, -17, -17, -17, -17, -17, -17, -17, -17, -17, -17, -17, -17, -17, -17, -17, -17, -17, -17, -17, -17, -17, 1
};

return _data;
}
};

} // namespace seqan

#include <OpenMS/ANALYSIS/SEQUENCE/NeedlemanWunsch.h>

namespace OpenMS
{
Expand All @@ -161,31 +55,22 @@ namespace OpenMS
/// Default constructor
ConsensusIDAlgorithmPEPMatrix();

private:
/// SeqAn similarity scoring
typedef ::seqan::Score<int, ::seqan::ScoreMatrix< ::seqan::AminoAcid, ::seqan::Default> > SeqAnScore;

/// SeqAn amino acid sequence
typedef ::seqan::String< ::seqan::AminoAcid> SeqAnSequence;

/// Similarity scoring method
SeqAnScore scoring_method_;
private:

/// Alignment data structure
::seqan::Align<SeqAnSequence, ::seqan::ArrayGaps> alignment_;
NeedlemanWunsch alignment_;

/// Not implemented
ConsensusIDAlgorithmPEPMatrix(const ConsensusIDAlgorithmPEPMatrix&);

/// Not implemented
ConsensusIDAlgorithmPEPMatrix& operator=(const ConsensusIDAlgorithmPEPMatrix&);

/// Docu in base class
void updateMembers_() override;

/// Sequence similarity based on substitution matrix (ignores PTMs)
double getSimilarity_(AASequence seq1, AASequence seq2) override;

// Docu in base class
void updateMembers_() override;
};

} // namespace OpenMS
Expand Down
45 changes: 45 additions & 0 deletions src/openms/include/OpenMS/ANALYSIS/SEQUENCE/NeedlemanWunsch.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#include <vector>
#include <OpenMS/DATASTRUCTURES/String.h>
#include <OpenMS/OpenMSConfig.h>
#include <OpenMS/DATASTRUCTURES/DefaultParamHandler.h>

namespace OpenMS
{
class OPENMS_DLLAPI NeedlemanWunsch
{

public:
enum class ScoringMatrix
{
identity,
PAM30MS,
SIZE_OF_SCORINGMATRIX
};

NeedlemanWunsch(ScoringMatrix matrix, int penalty);
NeedlemanWunsch();

~NeedlemanWunsch()=default;

static const std::vector<std::string> NamesOfScoringMatrices;

int align(const String& seq1, const String& seq2);

void setMatrix(const ScoringMatrix& matrix);
void setMatrix(const std::string& matrix);

void setPenalty(const int penalty);

ScoringMatrix getMatrix() const;

int getPenalty() const;

private:
unsigned seq1_len_ = 0;
unsigned seq2_len_ = 0;
int gap_penalty_ = 0;
int my_matrix_ = 0;
std::vector<int> first_row_{};
std::vector<int> second_row_{};
};
}
8 changes: 5 additions & 3 deletions src/openms/include/OpenMS/DATASTRUCTURES/FASTAContainer.h
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,8 @@ class FASTAContainer<TFI_File>
offsets_(),
data_fg_(),
data_bg_(),
chunk_offset_(0)
chunk_offset_(0),
filename_(FASTA_file)
{
f_.readStart(FASTA_file);
}
Expand Down Expand Up @@ -199,19 +200,19 @@ class FASTAContainer<TFI_File>
}

/// is the FASTA file empty?
bool empty() const
bool empty()
{ // trusting the FASTA file can be read...
return f_.atEnd() && offsets_.empty();
}

/// resets reading of the FASTA file, enables fresh reading of the FASTA from the beginning
void reset()
{
f_.setPosition(0);
offsets_.clear();
data_fg_.clear();
data_bg_.clear();
chunk_offset_ = 0;
f_.readStart(filename_);
}


Expand All @@ -231,6 +232,7 @@ class FASTAContainer<TFI_File>
std::vector<FASTAFile::FASTAEntry> data_fg_; ///< active (foreground) data
std::vector<FASTAFile::FASTAEntry> data_bg_; ///< prefetched (background) data; will become the next active data
size_t chunk_offset_; ///< number of entries before the current chunk
std::string filename_;///< FASTA file name
};

/**
Expand Down
Loading