Skip to content

Commit

Permalink
Fix for row order of struct mat
Browse files Browse the repository at this point in the history
Consistent with f41988c, order in .samples file will be respected. This will not have affected uses of kmds, but may have affected uses of mash
  • Loading branch information
johnlees committed Mar 23, 2017
1 parent f5c3f64 commit 0799574
Show file tree
Hide file tree
Showing 3 changed files with 32 additions and 25 deletions.
19 changes: 8 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,18 @@
Sequence element enrichment analysis. This document contains
installation instuctions. Usage can be found on the [wiki](https://github.com/johnlees/seer/wiki/Usage), and more information in the [paper](http://www.nature.com/articles/ncomms12797).

Installation
==============
###Use a pre-compiled release
## Installation
### Use a pre-compiled release

Head to the [release](https://github.com/johnlees/seer/releases) page and download and unpack the tarball. If you have the dependencies installed use the dynamic version, otherwise use the static version (tested on Ubuntu only; static_all should work on other 64-bit Linux platforms).

###Use on a virtual machine
### Use on a virtual machine

We have a virtual machine, containing SEER and other useful bioinformatics programs, which is available at
ftp://ftp.sanger.ac.uk/pub/pathogens/pathogens-vm/pathogens-vm.latest.ova
and can be imported as an appliance in [VirtualBox](https://www.virtualbox.org/).

###Compile source code
### Compile source code

First clone the repository

Expand All @@ -32,8 +31,7 @@ Currently tested on Linux only, installation should proceed as

Full installation instructions are available <a href="#installation-on-ubuntubiolinux">below</a>

Dependencies
--------------
## Dependencies
seer currently depends on

- gzstream <http://www.cs.unc.edu/Research/compgeom/gzstream/>
Expand All @@ -49,7 +47,7 @@ You will also require

You probably already have boost, HDF5 and dlib (as long as you did clone --recursive).

###Installation on Ubuntu/biolinux
### Installation on Ubuntu/biolinux

Running the following commands will install seer

Expand All @@ -75,7 +73,7 @@ Running the following commands will install seer
cd ..
cd src && make CXX=/usr/bin/g++-4.9

###General installation instructions
### General installation instructions

**gzstream**

Expand Down Expand Up @@ -142,6 +140,5 @@ do by running
make CXX=g++-4.9


Usage, interpretation of results, and troubleshooting
=============
## Usage, interpretation of results, and troubleshooting
See the [wiki](https://github.com/johnlees/seer/wiki/Usage)
36 changes: 23 additions & 13 deletions src/seerIO.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@ arma::mat readHDF5(const std::string& file_name)

arma::mat readMDS(const std::string& file_name, const std::vector<Sample>& sample_names)
{
std::map<std::string,unsigned int> mds_idx;
arma::mat MDS = readHDF5(file_name);

// Check that the sample names match up
Expand All @@ -147,31 +148,40 @@ arma::mat readMDS(const std::string& file_name, const std::vector<Sample>& sampl
std::ifstream samples_in(sample_name_file.c_str());
if (samples_in)
{
arma::uvec keep_indices(sample_names.size());
unsigned int sample_row = 0;
arma::uvec keep_indices(MDS.n_rows);
unsigned int file_row = 0;

// Read in sample file to get MDS row order
while (samples_in)
{
std::string sample_name;
samples_in >> sample_name;

// Must be ordered, and lines in sample_names be a subset of what
// is in the file. Otherwise a non-compatible mds will be returned
// which will throw
if (sample_name == sample_names.at(sample_row).iid())
if (samples_in)
{
keep_indices(sample_row) = file_row;
if (++sample_row >= sample_names.size())
{
break;
}
mds_idx[sample_name] = file_row;
}
++file_row;
}

// Get MDS rows (using sample file read above) in same sorted order as
// sample vector
unsigned int sample_row = 0;
for (auto it = sample_names.begin(); it != sample_names.end(); ++it)
{
auto find_it = mds_idx.find(it->iid());
if (find_it == mds_idx.end())
{
throw std::runtime_error("Could not find sample " + it->iid() + " in the pheno file");
}
else
{
keep_indices(sample_row) = find_it->second;
}
sample_row++;
}

// Only keep the rows where the pheno file has data
if (sample_row == sample_names.size())
if (mds_idx.size() >= sample_names.size())
{
MDS = MDS.rows(keep_indices);
}
Expand Down
2 changes: 1 addition & 1 deletion src/seercommon.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
#include "covar.hpp"

// Constants
const std::string VERSION = "1.2alpha2";
const std::string VERSION = "1.2alpha3";
// Default options
const double maf_default = 0.01;
const long int max_length_default = 100;
Expand Down

0 comments on commit 0799574

Please sign in to comment.