Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEVER MERGE: protype for modular/streaming simplifier #443

Draft
wants to merge 55 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
ce926cb
Bump version to 0.13.0-alpha.0
molpopgen Dec 23, 2022
09d81c4
feat: Use Bookmark in haploid_wright_fisher example. (#441)
molpopgen Jan 11, 2023
a506523
feat: Implement efficient edge buffering
molpopgen Dec 23, 2022
28eab9b
commenting out code == tests still pass
molpopgen Jan 11, 2023
4a1d121
note re: perf
molpopgen Jan 12, 2023
c281077
Add prototypes for stuff we will need.
molpopgen Jan 13, 2023
dbd018f
init the simplifier
molpopgen Jan 13, 2023
773f363
move comment
molpopgen Jan 13, 2023
0f3e472
lots of C fixes re: compiling
molpopgen Jan 13, 2023
0946665
now we build
molpopgen Jan 13, 2023
b4ef1b4
free
molpopgen Jan 13, 2023
09889cd
infrastructure
molpopgen Jan 13, 2023
500448c
getting safe-ish wrapper in place.
molpopgen Jan 13, 2023
0c48cb2
some details
molpopgen Jan 13, 2023
d429502
what needs to happen
molpopgen Jan 13, 2023
db85335
draft impl for simplifying using streaming simplifier
molpopgen Jan 16, 2023
1c14bd9
make new fn pub
molpopgen Jan 16, 2023
d55895e
fix segfault. find safety issues that is a bit of a doozy.
molpopgen Jan 16, 2023
11bbd02
fix finalise
molpopgen Jan 16, 2023
4ed3977
confusion reigns right now
molpopgen Jan 16, 2023
ceffa91
more print debug
molpopgen Jan 16, 2023
4bc9a05
hmmm
molpopgen Jan 16, 2023
1df577a
C-side stress
molpopgen Jan 16, 2023
1665c86
not erroring any more
molpopgen Jan 16, 2023
bb1eba4
clean out debug code
molpopgen Jan 16, 2023
6955076
on our way to copy-free
molpopgen Jan 16, 2023
d8df019
more back end
molpopgen Jan 16, 2023
60520ac
borrow checker hates this
molpopgen Jan 16, 2023
7465144
maybe on to something
molpopgen Jan 16, 2023
510b7bb
tests fail for overlapping generations
molpopgen Jan 16, 2023
b90060e
some tidy, some messy
molpopgen Jan 16, 2023
a8bcf57
different failure point now
molpopgen Jan 16, 2023
7669cd3
works for overlapping gens now
molpopgen Jan 16, 2023
a13f0d6
start to abstract out so that we can directly compare results via tests
molpopgen Jan 17, 2023
389adc6
standard case
molpopgen Jan 17, 2023
7f571e6
and now the new hotness
molpopgen Jan 17, 2023
46f20d7
getting there
molpopgen Jan 17, 2023
cd1ecbe
delete reduntant test
molpopgen Jan 17, 2023
60d5b86
branch lengths messing up?
molpopgen Jan 17, 2023
27672fb
nope, still broken
molpopgen Jan 17, 2023
79f57e7
num edges is the problem
molpopgen Jan 17, 2023
08f7056
refactor for fmt/lsp
molpopgen Jan 17, 2023
d411782
improve match in tests
molpopgen Jan 17, 2023
bed6ebf
tests pass
molpopgen Jan 17, 2023
feaadf1
remove uneeded test fn in trait
molpopgen Jan 17, 2023
14d33bf
dedup the tests
molpopgen Jan 17, 2023
91463a1
add crossover to the test
molpopgen Jan 17, 2023
efb9bda
one fix due to a failing proptest
molpopgen Jan 17, 2023
ba3e68e
fix error in registering children during post-simplification bufferin…
molpopgen Jan 17, 2023
abed0be
remove duplicated effort
molpopgen Jan 17, 2023
937528b
;
molpopgen Jan 17, 2023
f554b0e
note for the future
molpopgen Jan 17, 2023
4b11d75
make popsize/nsteps top-level variables in test
molpopgen Jan 18, 2023
7239232
update TODO list
molpopgen Jan 18, 2023
86bd18c
simplify things a lot
molpopgen Jan 18, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "tskit"
version = "0.12.0"
version = "0.13.0-alpha.0"
authors = ["tskit developers <admin@tskit.dev>"]
build = "build.rs"
edition = "2021"
Expand Down Expand Up @@ -50,6 +50,7 @@ pkg-config = "0.3"
[features]
provenance = ["humantime"]
derive = ["tskit-derive", "serde", "serde_json", "bincode"]
edgebuffer = []

[package.metadata.docs.rs]
all-features = true
Expand All @@ -58,3 +59,7 @@ rustdoc-args = ["--cfg", "doc_cfg"]
# Not run during tests
[[example]]
name = "tree_traversals"

[[example]]
name = "haploid_wright_fisher_edge_buffering"
required-features = ["edgebuffer"]
35 changes: 32 additions & 3 deletions examples/haploid_wright_fisher.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,30 @@ use proptest::prelude::*;
use rand::distributions::Distribution;
use rand::SeedableRng;

fn rotate_edges(bookmark: &tskit::types::Bookmark, tables: &mut tskit::TableCollection) {
let num_edges = tables.edges().num_rows().as_usize();
let left =
unsafe { std::slice::from_raw_parts_mut((*tables.as_mut_ptr()).edges.left, num_edges) };
let right =
unsafe { std::slice::from_raw_parts_mut((*tables.as_mut_ptr()).edges.right, num_edges) };
let parent =
unsafe { std::slice::from_raw_parts_mut((*tables.as_mut_ptr()).edges.parent, num_edges) };
let child =
unsafe { std::slice::from_raw_parts_mut((*tables.as_mut_ptr()).edges.child, num_edges) };
let mid = bookmark.edges().as_usize();
left.rotate_left(mid);
right.rotate_left(mid);
parent.rotate_left(mid);
child.rotate_left(mid);
}

// ANCHOR: haploid_wright_fisher
fn simulate(
seed: u64,
popsize: usize,
num_generations: i32,
simplify_interval: i32,
update_bookmark: bool,
) -> Result<tskit::TreeSequence> {
if popsize == 0 {
return Err(anyhow::Error::msg("popsize must be > 0"));
Expand Down Expand Up @@ -46,6 +64,7 @@ fn simulate(
let parent_picker = rand::distributions::Uniform::new(0, popsize);
let breakpoint_generator = rand::distributions::Uniform::new(0.0, 1.0);
let mut rng = rand::rngs::StdRng::seed_from_u64(seed);
let mut bookmark = tskit::types::Bookmark::new();

for birth_time in (0..num_generations).rev() {
for c in children.iter_mut() {
Expand All @@ -64,7 +83,10 @@ fn simulate(
}

if birth_time % simplify_interval == 0 {
tables.full_sort(tskit::TableSortOptions::default())?;
tables.sort(&bookmark, tskit::TableSortOptions::default())?;
if update_bookmark {
rotate_edges(&bookmark, &mut tables);
}
if let Some(idmap) =
tables.simplify(children, tskit::SimplificationOptions::default(), true)?
{
Expand All @@ -73,6 +95,9 @@ fn simulate(
*o = idmap[usize::try_from(*o)?];
}
}
if update_bookmark {
bookmark.set_edges(tables.edges().num_rows());
}
}
std::mem::swap(&mut parents, &mut children);
}
Expand All @@ -91,6 +116,8 @@ struct SimParams {
num_generations: i32,
simplify_interval: i32,
treefile: Option<String>,
#[clap(short, long, help = "Use bookmark to avoid sorting entire edge table.")]
bookmark: bool,
}

fn main() -> Result<()> {
Expand All @@ -100,6 +127,7 @@ fn main() -> Result<()> {
params.popsize,
params.num_generations,
params.simplify_interval,
params.bookmark,
)?;

if let Some(treefile) = &params.treefile {
Expand All @@ -114,8 +142,9 @@ proptest! {
#[test]
fn test_simulate_proptest(seed in any::<u64>(),
num_generations in 50..100i32,
simplify_interval in 1..100i32) {
let ts = simulate(seed, 100, num_generations, simplify_interval).unwrap();
simplify_interval in 1..100i32,
bookmark in proptest::bool::ANY) {
let ts = simulate(seed, 100, num_generations, simplify_interval, bookmark).unwrap();

// stress test the branch length fn b/c it is not a trivial
// wrapper around the C API.
Expand Down
158 changes: 158 additions & 0 deletions examples/haploid_wright_fisher_edge_buffering.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
// This is a rust implementation of the example
// found in tskit-c

use anyhow::Result;
use clap::Parser;
#[cfg(test)]
use proptest::prelude::*;
use rand::distributions::Distribution;
use rand::SeedableRng;
use tskit::NodeId;

// ANCHOR: haploid_wright_fisher_edge_buffering
fn simulate(
seed: u64,
popsize: usize,
num_generations: i32,
simplify_interval: i32,
) -> Result<tskit::TreeSequence> {
if popsize == 0 {
return Err(anyhow::Error::msg("popsize must be > 0"));
}
if num_generations == 0 {
return Err(anyhow::Error::msg("num_generations must be > 0"));
}
if simplify_interval == 0 {
return Err(anyhow::Error::msg("simplify_interval must be > 0"));
}
let mut tables = tskit::TableCollection::new(1.0)?;

// create parental nodes
let mut parents_and_children = {
let mut temp = vec![];
let parental_time = f64::from(num_generations);
for _ in 0..popsize {
let node = tables.add_node(0, parental_time, -1, -1)?;
temp.push(node);
}
temp
};

// allocate space for offspring nodes
parents_and_children.resize(2 * parents_and_children.len(), tskit::NodeId::NULL);

// Construct non-overlapping mutable slices into our vector.
let (mut parents, mut children) = parents_and_children.split_at_mut(popsize);

let parent_picker = rand::distributions::Uniform::new(0, popsize);
let breakpoint_generator = rand::distributions::Uniform::new(0.0, 1.0);
let mut rng = rand::rngs::StdRng::seed_from_u64(seed);
let mut buffer = tskit::EdgeBuffer::default();
let mut node_map: Vec<NodeId> = vec![];

for birth_time in (0..num_generations).rev() {
for c in children.iter_mut() {
let bt = f64::from(birth_time);
let child = tables.add_node(0, bt, -1, -1)?;
let left_parent = parents
.get(parent_picker.sample(&mut rng))
.ok_or_else(|| anyhow::Error::msg("invalid left_parent index"))?;
let right_parent = parents
.get(parent_picker.sample(&mut rng))
.ok_or_else(|| anyhow::Error::msg("invalid right_parent index"))?;
//buffer.setup_births(&[*left_parent, *right_parent], &[child])?;
let breakpoint = breakpoint_generator.sample(&mut rng);
buffer.buffer_birth(*left_parent, child, 0., breakpoint)?;
buffer.buffer_birth(*right_parent, child, breakpoint, 1.0)?;
//buffer.finalize_births();
*c = child;
}

if birth_time % simplify_interval == 0 {
//buffer.pre_simplification(&mut tables)?;
//tables.full_sort(tskit::TableSortOptions::default())?;
node_map.resize(tables.nodes().num_rows().as_usize(), tskit::NodeId::NULL);
tskit::simplfify_from_buffer(
children,
tskit::SimplificationOptions::default(),
&mut tables,
&mut buffer,
Some(&mut node_map),
)?;
for o in children.iter_mut() {
assert!(o.as_usize() < node_map.len());
*o = node_map[usize::try_from(*o)?];
assert!(!o.is_null());
}
//if let Some(idmap) =
// tables.simplify(children, tskit::SimplificationOptions::default(), true)?
//{
// // remap child nodes
// for o in children.iter_mut() {
// *o = idmap[usize::try_from(*o)?];
// }
//}
buffer.post_simplification(children, &mut tables)?;
}
std::mem::swap(&mut parents, &mut children);
}

tables.build_index()?;
let treeseq = tables.tree_sequence(tskit::TreeSequenceFlags::default())?;

Ok(treeseq)
}
// ANCHOR_END: haploid_wright_fisher_edge_buffering

#[derive(Clone, clap::Parser)]
struct SimParams {
seed: u64,
popsize: usize,
num_generations: i32,
simplify_interval: i32,
treefile: Option<String>,
}

fn main() -> Result<()> {
let params = SimParams::parse();
let treeseq = simulate(
params.seed,
params.popsize,
params.num_generations,
params.simplify_interval,
)?;

if let Some(treefile) = &params.treefile {
treeseq.dump(treefile, 0)?;
}

Ok(())
}

#[cfg(test)]
proptest! {
#[test]
fn test_simulate_proptest(seed in any::<u64>(),
num_generations in 50..100i32,
simplify_interval in 1..100i32) {
let ts = simulate(seed, 100, num_generations, simplify_interval).unwrap();

// stress test the branch length fn b/c it is not a trivial
// wrapper around the C API.
{
use streaming_iterator::StreamingIterator;
let mut x = f64::NAN;
if let Ok(mut tree_iter) = ts.tree_iterator(0) {
// We will only do the first tree to save time.
if let Some(tree) = tree_iter.next() {
let b = tree.total_branch_length(false).unwrap();
let b2 = unsafe {
tskit::bindings::tsk_tree_get_total_branch_length(tree.as_ptr(), -1, &mut x)
};
assert!(b2 >= 0, "{}", b2);
assert!(f64::from(b) - x <= 1e-8);
}
}
}
}
}
Loading