-
Notifications
You must be signed in to change notification settings - Fork 4
/
timing.txt
78 lines (73 loc) · 5.34 KB
/
timing.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
The purpose of this timing exercise is to highlight the bottlenecks. The actual values will depend on the hardware setup.
icgc
├── 00_data_download
│ ├── 01_download_icgc_dir_structure.py
│ ├── 02_download_somatic_files.py ~2hrs (~10Mbps download; after-the-fact estimate)
│ ├── 03_download_other_files.py
│ ├── 05_unzip.py
│ ├── 06_group_cancers.py
│
---------------------------------------------------------
├── 10_local_db_loading
│ ├── 05_find_max_field_length.py 40 min (can be skipped if you are working with ICGC v<=27
and/or trust the dimensions in icgc_utils/icgc.py)
│ ├── 06_make_tables.py <1 s
│ ├── 07_write_mutations_tsv.py ~55 min 10fold parallelization
│ ├── 08_write_other_tsv.py 2 s
│ ├── 09_load_mysql.py 22 min
│ ├── 10_make_indices_on_temp_tables.py 270 min, 12 fold pll (what's up with this?; parallelization bad)
│ ├── 15_assembly_check.py 2 min
│ ├── 16_donor_check.py 1 min
│ ├── 20_hgnc_name_resolution_table.py <1 s
│ ├── 22_ensembl_id_table.py 3 s
│ ├── 23_ensembl_coding_seqs.py 8 s
│ ├── 24_ucsc_gene_coords_table.py <1 min
--------------------------------------------------------
├── 20_local_db_reorganization
│ ├── 06_consequence_vocab.py ~3 mins
│ ├── 08_check_mut_tables_and_make_new_ones.py <1s
│ ├── 10_reorganize_variants.py ~22 min in total, 20 fold pll (advice: MyISAM)
from<1s to 22 mins (for MELA) per tumor type
│ ├── 11_delete_variants_from_normal.py 107 mins, 10 fold pll
│ ├── 13_reorganize_mutations.py 110 mins with 12 fold parallelization
chrom 2 takes ~90 min
│ ├── 14_reorganize_locations.py ~32 mins with 12 fold parallelization
chrom 1 takes ~25 min, the other chroms less
│ ├── 15_set_pathogenicity_in_mutation_tables.py ~15 min, 8 fold pll
│ ├── 16_pathg_annotation_from_mutations_to_variants.py 7 min, 8 fold pll
│ ├── 17_make_jumbo_index_on_new_tables.py ~20 min, no pll
│ ├── 18_cleanup_duplicate_entries.py 120 min, 10 fold pll
│ ├── 19_cleanup_duplicate_donors.py 2 min single cpu
│ ├── 21_add_reliability_annotation_to_somatic.py 7 min, 8 fold pll
│ ├── 22_cleanup_duplicate_specimens.py ~6 min, 8 fold pll
│ ├── 23_cleanup_duplicate_samples.py ~14 min, 21 fold pll
│ ├── 25_cleanup_duplicate_donors_by_variant_overlap.py ~5 min, no pll
│ ├── 27_copy_reliability_info_to_mutations.py ~10 min, 8 fold pll
│ ├── 29_index_on_mutation_tables.py ~6 min
│ ├── 30_mutation2gene_maps.py ~90 min, 12 fold pll
│ ├── 32_load_ensembl_coord_patches.py <1s
│ └── 34_reannotate_missense_mutations.py ~180 min, 12 fold pll
--------------------------------------------------------
├── 30_tcga_merge
│ ├── 28_add_tables_for_tcga_variants.py <1 min (didn't ime)
│ ├── 30_reorganize_tcga_variants.py ~60 min, 8 fold pll
│ ├── 34_cleanup_duplicate_specimens.py ~3 min, no pll
│ ├── 35_cleanup_duplicate_samples.py <1 min, 8 fold pll
│ ├── 36_cleanup_duplicate_donors_by_variant_overlap.py ~3 min, no pll
│ ├── 38_pathg_from_mutations_to_variants.py ~6 min, 8 fold pll
│ ├── 39_reannotate_missense_mutations.py ~12 min, 12 fold pll
│ └── 40_mutation2gene_maps.py ~6 min
--------------------------------------------------------
├── 40_housekeeping
│ ├── 33_drop_empty_tables.py <1min
│ ├── 35_gene_column_to_variants.py ~15min
│ ├── 37_database_stats.py ~1min
│ ├── 38_database_stats_2.py
│ ├── 40_var_sanity_checks.py <1s
│ ├── 45_cancer_types_table.py <1s
--------------------------------------------------------
├── 50_production
│ ├── 40_specimen_idx_on_varians.py 14min, no pll
│ ├── 42_gene_stats_generic.py
│ ├── 48_co-occurence <2 mins 12 fold pll
│ ├── 50_co-occurence_postprocess.py 10-20 mins depdnding on the gene and required precision