Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

confusion about the smooth_gaussian_kernel.cpp with mnnCorrect() #20

Open
yuxiaokang-source opened this issue Sep 29, 2020 · 5 comments

Comments

@yuxiaokang-source
Copy link

yuxiaokang-source commented Sep 29, 2020

I want to implement the mnnCorrect() function with R(without C++), so I read the source code of smooth_gaussian_kernel.cpp. In the smooth_gaussian_kernel function:
const size_t ngenes=averaged.nrow();
const size_t nmnn=averaged.ncol();
if (nmnn!=index.size()) { throw std::runtime_error("'index' must have length equal to number of rows in 'averaged'"); }

the variable nmnn represents the column of "averaged", why it should be equal to number of rows in "averaged" in the if condition?

When I debug the .cpp source file with lldb, I found I have to transpose the "averaged" before I call the smooth_gaussion_kernel function in the smooth_gaussian_kernel.cpp, otherwise It will throw the error 'index' must have length equal to number of rows in 'averaged'. but in the R code where mnnCorrect call this C++ function.
function (data1, data2, mnn1, mnn2, tdata2, sigma) { vect <- data1[mnn1, , drop = FALSE] - data2[mnn2, , drop = FALSE] cell.vect <- .Call(cxx_smooth_gaussian_kernel, vect, mnn2 - 1L, tdata2, sigma) t(cell.vect) }
It didn't transpose the vect(in my simulation data,the number of columns of vect represents the number of genes ), but it give the correct result and didn't throw an error, I get confused and dont know why this happens? In a word, I have two problems,

  1. I think the if condition should be
    if n_genes==index.size() instead of if (nmnn!=index.size()),
  2. I should transpose with the first argument of the smooth_gaussion_kernel function(the vert or the averaged) or not ?

Originally posted by @yuxiaokang-source in #18 (comment)

@LTLA
Copy link
Owner

LTLA commented Sep 30, 2020

Is there an actual error you want to report, or are you just poking around in the C++ code for fun?

It's fine if it's the latter, I just want to know.

@yuxiaokang-source
Copy link
Author

yuxiaokang-source commented Sep 30, 2020

In the source code with the package , It run fine and didn't throw any error,So It isn't an actual error, I just very confused with the C++ code. In my opinion ,It should throw an error but the paragram didn't in fact. Can you give me some hints about this problem? I wonder I have demonstrate this problem or not?

@LTLA
Copy link
Owner

LTLA commented Sep 30, 2020

  1. I think the if condition should be
    if n_genes==index.size() instead of if (nmnn!=index.size()),

The existing logic is correct, the number of columns of averaged should be the same as index.size(). It's just the error message that's wrong, which has no consequence whatsoever because the error message should never be triggered by incorrect user inputs. (The check is only provided in C++ to avoid segmentation faults that would crash the entire R sesion.)

2. I should transpose with the first argument of the smooth_gaussion_kernel function(the vert or the averaged) or not ?

Clearly the current code is working correctly, so you should just do whatever it's doing.

@yuxiaokang-source
Copy link
Author

yuxiaokang-source commented Sep 30, 2020

截屏2020-09-30 下午4 30 58
As you can see in the picture:
The number of rows of vect(the averaged in smooth_gaussian_kernel.cpp) is equal to the length of mnn2(the index in smooth_gaussian_kernel.cpp).This is a contradiction to

const size_t ngenes=averaged.nrow();
const size_t nmnn=averaged.ncol();
  if (nmnn!=index.size()) {
        throw std::runtime_error("'index' must have length equal to number of columns in 'averaged'");
    }

My simulation code is:

rm(list=ls())
set.seed(1)
library(batchelor)
library(scran)
library(scater)
suppressMessages(library(splatter))
params <- newSplatParams()
params <- setParam(params, "nGenes", 2000)
params <- setParam(params, "batchCells", c(200,200,200))
params <- setParam(params, "batch.facLoc", 0.5)
params <- setParam(params, "batch.facScale", 0.5)
params <- setParam(params, "group.prob", c(1/3,1/3,1/3))
sim <- splatSimulate(params, method="groups", verbose=FALSE)
sim <- computeSumFactors(sim)
sim <- normalize(sim)
library(batchelor)
dat1=sim[,colData(sim)$Batch=="Batch1"]
dat2=sim[,colData(sim)$Batch=="Batch2"]
dat3=sim[,colData(sim)$Batch=="Batch3"]
g.out=mnnCorrect(dat1,dat2,dat3)
logcounts(g.out)=assay(g.out,"corrected")
g.out <- runTSNE(g.out)
print(plotTSNE(g.out, colour_by="batch"))
g.out$Group=colData(sim)$Group#

It works well ,but I can't figure out why this can work? Can you give me some suggestions?
Thank you very much!

@LTLA
Copy link
Owner

LTLA commented Oct 12, 2020

Your installation is quite out of date compared to the code in this repository. For example, the corresponding R code is now:

batchelor/R/mnnCorrect.R

Lines 453 to 456 in 719f63a

vect <- data1[mnn1,,drop=FALSE] - data2[mnn2,,drop=FALSE]
averaged <- sumCountsAcrossCells(t(vect), DataFrame(ID=mnn2), average=TRUE)
cell.vect <- smooth_gaussian_kernel(assay(averaged, withDimnames=FALSE), averaged$ID-1L, tdata2, sigma)
t(cell.vect)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants