This is an automatically generated1 ranked list of open source software from pharmaceutical companies and cross organizations, biotechnology companies, research institutes, open source communities and individuals, plus some life-science software from technological companies.
It's made from a curated list of GitHub accounts, and will be periodically refreshed from these sources' repositories.
You can also access what they have updated lately and which topics are covered by these software.
Note
stars - number of people who especially appreciated the repository
forks - number of people who have cloned the repository in order to modify it
watchers - number of people who are monitoring changes in the repository
main programming language
license
last update date & time
Rank | Software |
---|---|
1 | google-deepmind/alphafold Open source code for AlphaFold. 11987 2135 226 Python Apache-2.0 license 2023-04-05 09:45:53 |
2 | deepchem/deepchem Democratizing Deep-Learning for Drug Discovery, Quantum Chemistry, Materials Science and Biology biology , deep-learning , drug-discovery , hacktoberfest , materials-science , quantum-chemistry 5220 1626 Python MIT License 2024-06-08 13:03:11 |
3 | biopython/biopython Official git repository for Biopython (originally converted from CVS) bioinformatics , biopython , dna , genomics , phylogenetics , protein , protein-structure , python , sequence-alignment 4213 1728 168 Python Unknown LICENSE |
4 | google/deepvariant DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data. bioinformatics , deep-learning , deep-neural-network , deepvariant , dna , genome , genomics , machine-learning , ngs , science , sequencing , tensorflow 3100 698 159 Python BSD-3-Clause license 2024-03-19 19:20:10 |
5 | facebookresearch/esm Evolutionary Scale Modeling (esm): Pretrained language models for proteins 2917 577 63 Python MIT license 2022-10-18 13:38:47 |
6 | aqlaboratory/openfold Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2 alphafold2 , protein-structure , pytorch 2572 466 Python Apache License 2.0 2024-06-04 08:33:28 |
7 | rdkit/rdkit The official sources for the RDKit library c-plus-plus , cheminformatics , python , rdkit 2483 845 HTML BSD 3-Clause "New" or "Revised" License 2024-06-08 03:18:22 |
8 | AstraZeneca/awesome-explainable-graph-reasoning A collection of research papers and software related to explainability in graph machine learning. awesome-list , deep-learning , explainable-ai , explainable-ml , graph , graph-algorithms , graphml 1941 129 Apache License 2.0 2022-04-04 14:54:08 |
9 | OpenGene/fastp An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...) adapter , bioinformatics , duplication , fastq , filter , filtering , illumina , merging , ngs , overlap , polyg , preprocessing , qc , quality , quality-control , sequencing , splitting , trimming , umi 1803 333 C++ MIT License 2024-04-07 08:16:11 |
10 | scverse/scanpy Single-cell analysis in Python. Scales to >1M cells. anndata , bioinformatics , data-science , machine-learning , python , scanpy , scverse , transcriptomics , visualize-data 1789 579 Python BSD 3-Clause "New" or "Revised" License 2024-06-07 08:43:34 |
11 | lh3/minimap2 A versatile pairwise aligner for genomic and spliced nucleotide sequences bioinformatics , genomics , sequence-alignment , spliced-alignment 1708 396 C Other 2024-05-22 19:58:33 |
12 | allenai/scispacy A full spaCy pipeline and models for scientific/biomedical documents. bioinformatics , biomedical , custom-pipes , nlp , scientific-documents , spacy 1629 221 52 Python Apache-2.0 license 2024-03-08 05:57:56 |
13 | broadinstitute/gatk Official code repository for GATK versions 4 and up bioinformatics , dna , gatk , genome , genomics , ngs , science , sequencing , spark 1621 577 156 Java specific 2023-12-13 22:53:56 |
14 | bioconda/bioconda-recipes Conda recipes for the bioconda channel. bioinformatics , conda , hacktoberfest , package-management 1595 3089 96 Shell MIT license |
15 | samtools/samtools Tools (written in C using htslib) for manipulating next-generation sequencing data 1572 572 C Other 2024-06-07 09:32:59 |
16 | Slicer/Slicer Multi-platform, free open source software for visualization and image computing. 3d-printing , 3d-slicer , c-plus-plus , computed-tomography , image-guided-therapy , image-processing , itk , kitware , medical-image-computing , medical-imaging , national-institutes-of-health , neuroimaging , nih , python , qt , registration , segmentation , tcia-dac , tractography , vtk 1521 520 38 C++ specific |
17 | lh3/bwa Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment) bioinformatics , fm-index , genomics , sequence-alignment 1468 547 C GNU General Public License v3.0 2024-04-15 02:54:32 |
18 | DeepGraphLearning/torchdrug A powerful and flexible machine learning platform for drug discovery deep-learning , drug-discovery , graph-neural-networks , pytorch 1407 194 31 Python Apache-2.0 license 2023-07-16 22:37:17 |
19 | lh3/seqtk Toolkit for processing sequences in FASTA/Q formats bioinformatics , sequence-analysis 1332 310 C MIT License 2023-10-24 15:01:39 |
20 | galaxyproject/galaxy Data intensive science for everyone. bioinformatics , dna , genomics , hacktoberfest , ngs , pipeline , science , sequencing , usegalaxy , workflow , workflow-engine 1329 967 69 Python specific 2024-05-07 13:56:26 |
21 | schrodinger/fixed-data-table-2 A React table component designed to allow presenting millions of rows of data. 1290 289 JavaScript Other 2024-05-23 05:13:10 |
22 | soedinglab/MMseqs2 MMseqs2: ultra fast and sensitive search and clustering suite alignment , bioinformatics , blast , linclust , metagenomics , mmseqs , profile-search , sequence-clustering , sequence-search , taxonomy 1281 181 C GNU General Public License v3.0 2024-05-23 07:07:21 |
23 | facebookresearch/fastMRI A large-scale dataset of both raw MRI measurements and clinical MRI images. convolutional-neural-networks , deep-learning , fastmri , fastmri-challenge , fastmri-dataset , medical-imaging , mri , mri-reconstruction , pytorch 1259 370 74 Python MIT license 2023-06-26 17:17:06 |
24 | greenelab/deep-review A collaboratively written review paper on deep learning, genomics, and precision medicine deep-learning , genomics , manubot , manuscript , neural-networks , review 1235 271 129 HTML Unknown LICENSE.md 2018-03-12 15:06:48 |
25 | shenwei356/seqkit A cross-platform and ultrafast toolkit for FASTA/Q file manipulation bioinformatics , cross-platform , fasta , fastq , golang , manipulation , sequence , tool , toolkit 1226 157 26 Go MIT license 2024-05-17 15:59:35 |
26 | MultiQC/MultiQC Aggregate results from bioinformatics analyses across many samples into a single report. analysis , bioconda , bioinformatics , data-visualization , multiqc , pypi , python , quality-control , reporting , seqera , vizualisation 1185 582 37 JavaScript GPL-3.0 license 2024-05-31 18:30:12 |
27 | dcm4che/dcm4che DICOM Implementation in JAVA 1165 637 119 Java specific 2024-04-22 10:59:11 |
28 | scverse/scvi-tools Deep probabilistic analysis of single-cell and spatial omics data cite-seq , deep-generative-model , deep-learning , human-cell-atlas , scrna-seq , scverse , single-cell-genomics , single-cell-rna-seq , variational-autoencoder , variational-bayes 1149 342 Python BSD 3-Clause "New" or "Revised" License 2024-06-05 17:01:13 |
29 | vgteam/vg tools for working with genome variation graphs dna , genome-graph , genomics , graph , variation-graph 1072 191 48 C++ specific 2024-05-20 18:50:28 |
30 | schrodinger/pymol-open-source Open-source foundation of the user-sponsored PyMOL molecular visualization system. 1071 260 C Other 2024-06-06 19:36:48 |
31 | scipipe/scipipe Robust, flexible and resource-efficient pipelines using Go and the commandline bioinformatics , bioinformatics-pipeline , cheminformatics , dataflow , fbp , go , golang , pipeline , scientific-workflows , scipipe , workflow , workflow-engine 1055 72 38 Go MIT license 2021-10-14 09:11:34 |
32 | shenwei356/csvtk A cross-platform, efficient and practical CSV/TSV toolkit in Golang bioinformatics , command-line , cross-platform , csv , golang , tool , toolkit , tsv 972 85 25 Go MIT license 2024-05-29 15:30:38 |
33 | bigdatagenomics/adam ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed. avro , big-data , bioinformatics , genomics , java , parquet , python , r , scala , spark 967 304 Scala Apache License 2.0 2024-03-23 13:27:52 |
34 | broadinstitute/cromwell Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments application , bioinformatics , cloud , containers , docker , executor , ga4gh , hpc , scala , wdl , workflow , workflow-description-language , workflow-execution 965 351 112 Scala BSD-3-Clause LICENSE.txt 2024-05-07 17:47:13 |
35 | hail-is/hail Cloud-native genomic dataframes and batch computing bioinformatics , genetics , genomics , gwas , hail , python , software , vcf 946 238 55 Python MIT license 2024-06-05 17:48:05 |
36 | broadinstitute/picard A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. 944 365 160 Java MIT license 2023-11-14 22:01:18 |
37 | aqlaboratory/proteinnet Standardized data set for machine learning of protein structure dataset , deep-learning , machine-learning , protein-sequence , protein-structure , proteins 849 130 Python MIT License 2020-11-18 23:43:32 |
38 | shenwei356/rush A cross-platform command-line tool for executing jobs in parallel bioinformatics , command , cross-platform , execute , golang , parallel , pipeline , shell , windows 834 63 20 Go MIT license 2023-11-13 17:53:58 |
39 | evo-design/evo DNA foundation modeling from molecular to genome scale 832 97 Jupyter Notebook Apache License 2.0 2024-04-30 22:35:34 |
40 | PaddlePaddle/PaddleHelix Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集 biocomputing , ddi , deeplearning , dti , graph-networks , machine-learning , molecule-design , ppi , protein-design , protein-docking , protein-folding , protein-structure-prediction , representation-learning , rna-structure-prediction , self-supervised-learning 799 188 25 Python Apache-2.0 license 2023-08-01 09:31:36 |
41 | samtools/htslib C library for high-throughput sequencing data formats bam , bcf , bioinformatics , cram , htslib , ngs , sam , vcf 779 448 C Other 2024-06-06 15:40:15 |
42 | google/nucleus Python and C++ code for reading and writing genomics data. bioinformatics , dna , genomics , tensorflow 777 126 53 C++ specific 2021-08-31 23:19:33 |
43 | nroduit/Weasis Weasis is a DICOM viewer available as a desktop application or as a web-based application. dicom , dicom-image , dicom-image-viewer , dicom-images , dicom-pr , dicom-rt , dicom-seg , dicom-viewer , dicom-web-viewer , dicomweb , ecg , export-dicom , medical , medical-imaging , multiplanar-reconstruction , viewer , volume-rendering , weasis 763 281 49 Java specific 2024-05-06 18:42:54 |
44 | baidu-research/NCRF Cancer metastasis detection with neural conditional random field (NCRF) camelyon16 , conditional-random-fields , deep-learning , pathology , whole-slide-imaging 749 184 37 Python Apache-2.0 license 2018-06-17 18:22:34 |
45 | AstraZeneca/chemicalx A PyTorch and TorchDrug based deep learning library for drug pair scoring. (KDD 2022) biology , chemistry , deep-chemistry , deep-learning , drug , drug-discovery , drug-interaction , drug-pair , geometric-deep-learning , geometry , graph-neural-network , machine-learning , pharma , polypharmacy , pytorch , smiles , smiles-strings , torch , torchdrug 701 89 Python Apache License 2.0 2023-09-11 08:01:43 |
46 | samtools/hts-specs Specifications of SAM/BAM and related high-throughput sequencing file formats 627 173 TeX 2024-06-06 06:50:26 |
47 | samtools/bcftools This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html 626 241 C Other 2024-06-07 13:13:17 |
48 | insilicomedicine/GENTRL Generative Tensorial Reinforcement Learning (GENTRL) model 596 216 Python 2020-04-28 11:58:05 |
49 | shenwei356/awesome Awesome resources on Bioinformatics, data science, machine learning, programming language (Python, Golang, R, Perl) and miscellaneous stuff. awesome , data-science , git , golang , linux , perl , programing-language , python 593 163 35 MIT license 2023-09-25 02:09:01 |
50 | chanzuckerberg/cellxgene An interactive explorer for single-cell transcriptomics data dataviz , scientific , scrna-seq , transcriptomics , visualization 591 111 33 JavaScript MIT license 2023-12-19 22:19:07 |
51 | invesalius/invesalius3 3D medical imaging reconstruction software 584 277 37 Python GPL-2.0 license 2022-04-14 02:28:31 |
52 | lh3/bioawk BWK awk modified for biological data bioinformatics , sequence-analysis 582 121 C 2022-08-11 01:06:45 |
53 | MolecularAI/aizynthfinder A tool for retrosynthetic planning astrazeneca , chemical-reactions , cheminformatics , monte-carlo-tree-search , neural-networks , reaction-informatics 548 125 Python MIT License 2024-06-03 13:34:33 |
54 | owkin/PyDESeq2 A Python implementation of the DESeq2 pipeline for bulk RNA-seq DEA. bioinformatics , differential-expression , python , rna-seq , transcriptomics 533 58 Python MIT License 2024-06-06 01:43:52 |
55 | broadinstitute/infercnv Inferring CNV from Single-Cell RNA-Seq 520 159 42 R specific 2020-02-07 20:29:28 |
56 | scverse/anndata Annotated data. anndata , bioinformatics , data-science , machine-learning , scanpy , scverse , transcriptomics 511 148 Python BSD 3-Clause "New" or "Revised" License 2024-06-07 16:03:50 |
57 | soedinglab/hh-suite Remote protein homology detection suite. alignment , bioinformatics , cpp , hh-suite , hhblits , hhpred , hhsearch , opensource , profile-profile-search , profile-search , protein-structure , sequence-search , simd , viterbi 509 128 C GNU General Public License v3.0 2023-08-13 08:44:05 |
58 | chhylp123/hifiasm Hifiasm: a haplotype-resolved assembler for accurate Hifi reads bioinformatics , denovo-assembly , genomics , hifi-read , pacbio 490 84 28 C++ MIT license 2024-05-06 14:29:45 |
59 | insitro/redun Yet another redundant workflow engine aws , bioinformatics , data-engineering , data-science , docker , etl , gcp , ml , python , workflow-engine 489 40 Python Apache License 2.0 2024-06-06 18:52:56 |
60 | biosustain/potion Flask-Potion is a RESTful API framework for Flask and SQLAlchemy, Peewee or MongoEngine flask , flask-extensions , mongoengine , peewee , sqlalchemy 488 51 Python Other 2019-04-23 17:00:39 |
61 | google-deepmind/alphamissense 461 58 25 Python Apache-2.0 license |
62 | scverse/squidpy Spatial Single Cell Analysis in Python data-visualization , image-analysis , single-cell-genomics , single-cell-rna-seq , spatial-analysis , spatial-transcriptomics , squidpy 399 71 Python BSD 3-Clause "New" or "Revised" License 2024-06-08 21:22:47 |
63 | lh3/minigraph Sequence-to-graph mapper and graph generator bioinformatics , genome-graph , genomics , pan-genome , sequence-alignment 394 38 C MIT License 2024-05-22 00:59:12 |
64 | benevolentAI/guacamol Benchmarks for generative chemistry 383 82 Python MIT License 2024-02-11 08:59:38 |
65 | calico/basenji Sequential regulatory activity predictions with deep convolutional neural networks. 373 119 Python Apache License 2.0 2024-05-28 20:08:23 |
66 | ome/bioformats Bio-Formats is a Java library for reading and writing data in life sciences image file formats. It is developed by the Open Microscopy Environment. Bio-Formats is released under the GNU General Public License (GPL); commercial licenses are available from Glencoe Software. bio-formats , format-converter , format-reader , image , java , life-sciences-image , lightsheet , metadata , whole-slide-imaging , wsi 367 239 Java GNU General Public License v2.0 2024-06-07 19:34:33 |
67 | MolecularAI/GraphINVENT Graph neural networks for molecular design. 356 74 Python MIT License 2023-03-11 11:55:32 |
67 | chembl/chembl_webresource_client Official Python client for accessing ChEMBL API chembl , cheminformatics , chemistry , chemoinformatics , python , rest , rest-client 356 95 Python Other 2024-02-26 15:44:57 |
68 | shenwei356/taxonkit A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV bioinformatics , cross-platform , lca , lineage , taxdump , taxid , taxonkit , taxonomy 342 29 10 Go MIT license 2024-04-25 17:15:34 |
69 | deepchem/DeepLearningLifeSciences Example code from the book "Deep Learning for the Life Sciences" 338 150 Jupyter Notebook MIT License 2021-09-17 05:10:37 |
70 | MolecularAI/Reinventastrazeneca , cheminformatics , denovo-design , neural-networks , reinforcement-learning , transfer-learning 332 108 Python Apache License 2.0 2023-10-19 05:26:16 |
71 | aqlaboratory/rgn Recurrent Geometric Networks for end-to-end differentiable learning of protein structure deep-learning , deep-neural-networks , protein-structure , protein-structure-prediction 326 89 Python MIT License 2019-08-01 14:17:59 |
72 | tencent-ailab/grover This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data 313 68 7 Python specific 2021-01-18 09:06:32 |
73 | lh3/miniprot Align proteins to genomes with splicing and frameshift bioinformatics , sequence-alignment 305 16 C MIT License 2024-04-12 21:01:25 |
74 | Roche/pyreadstat Python package to read sas, spss and stata files into pandas data frames. It is a wrapper for the C library readstat. conversion , pandas-dataframe , python , readstat , sas7bdat , spss , stata-files 303 55 C Other 2024-06-04 09:55:07 |
75 | lh3/miniasm Ultrafast de novo assembly for long noisy reads (though having no consensus step) bioinformatics , denovo-assembly , genomics 293 68 TeX MIT License 2023-12-13 01:35:58 |
76 | chanzuckerberg/MedMentions A corpus of Biomedical papers annotated with mentions of UMLS entities. 291 31 25 |
77 | AstraZeneca/rexmex A general purpose recommender metrics library for fair evaluation. coverage , deep-learning , evaluation , machine-learning , metric , metrics , mrr , personalization , precision , rank , ranking , recall , recommender , recommender-system , recsys , rsquared 275 25 Python 2023-08-22 09:22:20 |
78 | samtools/htsjdk A Java API for high-throughput sequencing data (HTS) formats. bam , cram , dna , fasta , genomics , java , java-api , ngs , sam , sequencing , vcf 274 244 Java 2024-06-04 18:40:43 |
79 | shenwei356/brename A practical cross-platform command-line tool for safely batch renaming files/directories via regular expression batch , batch-rename , batch-rename-files , batch-renamer , go , golang , rename , safe , windows 254 21 6 Go MIT license 2024-04-14 08:22:45 |
80 | lh3/wgsim Reads simulator bioinformatics , genomics 252 90 C 2021-09-03 14:58:22 |
81 | Acellera/htmd HTMD: Programming Environment for Molecular Discovery automate , drug-discovery , htmd , molecular-simulations 250 58 Rich Text Format Other 2024-06-07 15:24:26 |
82 | DeepGraphLearning/GearNet GearNet and Geometric Pretraining Methods for Protein Structure Representation Learning, ICLR'2023 (https://arxiv.org/abs/2203.06125) graph-neural-networks , pre-training , protein-representation-learning 249 26 10 Python MIT license |
83 | MolecularAI/REINVENT4 AI molecular design tool for de novo design, scaffold hopping, R-group replacement, linker design and molecule optimization. ai , astrazeneca , cheminformatics , chemistry , deep-learning , denovo-design , drug-design , drug-discovery , generative-ai , ml , molecule-generation , neural-networks , reinforcement-learning , transfer-learning 247 57 Python Apache License 2.0 2024-04-27 11:00:08 |
84 | rdkit/rdkit-tutorials Tutorials to learn how to work with the RDKit 239 71 Jupyter Notebook Other 2023-03-19 13:36:55 |
85 | insightsengineering/rtables Reporting tables with R pharmaceuticals , r , tables 213 49 R Other 2024-06-07 21:27:39 |
86 | Bayer-Group/cloudformation-template-generator A type-safe Scala DSL for generating CloudFormation templates 211 71 Scala BSD 3-Clause "New" or "Revised" License 2022-07-29 11:32:04 |
87 | pharmaverse/admiral ADaM in R Asset Library cdisc , clinical-trials , open-source , r 207 53 R Apache License 2.0 2024-06-07 18:23:44 |
87 | OpenGene/awesome-bio-datasets awesome-bio-datasets 207 42 MIT License 2017-10-28 12:32:15 |
88 | OpenGene/AfterQC Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data adapter-trimming , bioinformatics , error , fastq , filtering , ngs , overlap , qc , quality-control , sequencing , trimming 203 50 Python MIT License 2020-05-14 07:15:54 |
89 | Bayer-Group/etcd-aws-cluster A container to assist in managing a etcd2 cluster from an Amazon auto scaling group 202 102 Shell BSD 3-Clause "New" or "Revised" License 2017-02-01 01:09:05 |
89 | modernatx/seqlike Unified biological sequence manipulation in Python biological-sequences , biopython , machine-learning , sequence 202 18 Python Apache License 2.0 2024-02-16 13:13:05 |
89 | scverse/scirpy A scanpy extension to analyse single-cell TCR and BCR data. 202 31 Python BSD 3-Clause "New" or "Revised" License 2024-06-06 06:21:35 |
90 | lh3/gfatools Tools for manipulating sequence graphs in the GFA and rGFA formats bioinformatics , genome-graph , genomics 201 18 C 2024-02-20 15:29:14 |
90 | scverse/muon muon is a multimodal omics Python framework anndata , cite-seq , mudata , multi-omics , multimodal-data , multimodal-omics-analysis , muon , scanpy , scatac-seq , scrna-seq , scverse 201 28 Python BSD 3-Clause "New" or "Revised" License 2024-05-30 21:21:35 |
91 | aws-samples/aws-batch-genomics Software sets up and runs an genome sequencing analysis workflow using AWS Batch and AWS Step Functions. 199 75 39 Python Apache-2.0 license 2018-11-29 18:40:42 |
92 | rdkit/mmpdb A package to identify matched molecular pairs and use them to predict property changes. 195 53 Python Other 2024-04-30 10:55:30 |
93 | Acellera/moleculekit MoleculeKit: Your favorite molecule manipulation kit drug-discovery , machine-learning , molecular-modeling , molecular-simulation , molecule , proteins 193 35 Python Other 2024-06-04 13:53:30 |
94 | bioinform/somaticseq An ensemble approach to accurately detect somatic mutations using SomaticSeq cancer-genomics , somatic-variants 189 53 Python BSD 2-Clause "Simplified" License 2024-05-30 07:55:34 |
95 | MolecularAI/Chemformer 188 34 Python Apache License 2.0 2024-05-29 14:43:33 |
96 | owkin/FLamby Cross-silo Federated Learning playground in Python. Discover 7 real-world federated datasets to test your new FL strategies and try to beat the leaderboard. dataset , deep-learning , differential-privacy , federated-learning , healthcare , machine-learning , python 187 22 Python MIT License 2024-06-03 12:18:27 |
96 | ome/openmicroscopy OME (Open Microscopy Environment) develops open-source software and data format standards for the storage and manipulation of biological light microscopy data. A joint project between universities, research establishments and industry in Europe and the USA, OME has over 20 active researchers with strong links to the microscopy community. Funded … database , image , java , omero , python , server 187 100 Java GNU General Public License v2.0 2024-06-08 00:39:30 |
97 | AstraZeneca-NGS/VarDict VarDict 186 60 Perl MIT License 2024-01-05 14:06:13 |
97 | scverse/spatialdata An open and interoperable data framework for spatial omics data 186 34 Python BSD 3-Clause "New" or "Revised" License 2024-06-08 00:23:48 |
98 | haowenz/chromap Fast alignment and preprocessing of chromatin profiles bioinformatics , chromatin-profiles , genomics , sequence-analysis 184 18 7 C++ MIT license 2024-02-06 15:29:20 |
99 | chao1224/MoleculeSTM Multi-modal Molecule Structure-text Model for Text-based Editing and Retrieval, Nat Mach Intell 2023 (https://www.nature.com/articles/s42256-023-00759-6) clip , computation-chemistry , drug-discovery , editing , foundation-model , molecule-editing , moleculeclip , moleculestm , pretraining , retrieval 182 17 4 Python specific 2024-04-19 05:25:24 |
100 | openpharma/visR A package to wrap functionality for plots, tables and diagrams adhering to graphical principles. 179 32 R Other 2024-06-04 13:48:59 |
100 | chembl/ChEMBL_Structure_Pipeline ChEMBL database structure pipelines 179 38 Python MIT License 2023-10-25 15:20:47 |
101 | AstraZeneca/awesome-drug-discovery-knowledge-graphs A collection of research papers, datasets and software related to knowledge graphs for drug discovery. Accompanies the paper "A review of biomedical datasets relating to drug discovery: a knowledge graph perspective" (Briefings in Bioinformatics, 2022) awesome-list , drug-discovery , drug-discovery-knowledge-graph , knowledge-graph 177 19 Apache License 2.0 2023-09-10 16:33:40 |
102 | lh3/biofast Benchmarking programming languages/implementations for common tasks in Bioinformatics bioinformatics 175 26 C 2021-12-09 14:10:44 |
103 | shenwei356/kmcp Accurate metagenomic profiling && Fast large-scale sequence/genome searching bigsi , cobs , fracminhash , kmer , metagenomics , scaled-minhash , searching , sketch , sketching , syncmers , taxonomic-classification , taxonomic-profiling , virome 173 13 6 Go MIT license 2023-09-22 04:09:54 |
104 | rgcgithub/regenie regenie is a C++ program for whole genome regression modelling of large genome-wide association studies. 172 49 C++ Other 2024-04-03 13:52:31 |
105 | soedinglab/metaeuk MetaEuk - sensitive, high-throughput gene discovery and annotation for large-scale eukaryotic metagenomics bioinformatics , eukaryotes , gene-discovery , gene-prediction , metagenomics 171 24 C GNU General Public License v3.0 2024-05-30 09:04:06 |
106 | recursionpharma/gflownet GFlowNet library specialized for graph & molecular data deep-learning , gflownet , graph-neural-network , pytorch 168 34 Python MIT License 2024-06-06 13:29:06 |
106 | scverse/scanpy-tutorials Scanpy Tutorials. 168 113 Jupyter Notebook 2024-06-03 19:42:01 |
107 | bioinform/neusomatic NeuSomatic: Deep convolutional neural networks for accurate somatic mutation detection convolutional-neural-networks , deep-learning , genomics , somatic-variants 167 50 Python Other 2021-12-23 10:41:50 |
108 | lh3/readfq Fast multi-line FASTA/Q reader in several programming languages bioinformatics , sequence-analysis 166 60 C 2021-06-06 07:27:15 |
109 | insightsengineering/teal Exploratory Web Apps for Analyzing Clinical Trial Data clinical-trials , nest , r , shiny , webapp 164 29 R Other 2024-06-07 12:49:26 |
110 | lh3/cgranges A C/C++ library for fast interval overlap queries (with a "bedtools coverage" example) algorithm , bioinformatics , genomics 161 18 C MIT License 2024-05-28 21:47:37 |
110 | lh3/kmer-cnt Code examples of fast and simple k-mer counters for tutorial purposes bioinformatics , genomics , k-mer-counting 161 13 C++ MIT License 2020-03-10 16:24:06 |
111 | greenelab/tybalt Training and evaluating a variational autoencoder for pan-cancer gene expression data analysis , autoencoder , cancer , cancer-genomics , deep-learning , gene-expression , script , tool , unsupervised-learning , variational-autoencoder , variational-autoencoders 159 62 10 HTML BSD-3-Clause license 2017-11-13 13:38:42 |
112 | aqlaboratory/genie De Novo Protein Design by Equivariantly Diffusing Oriented Residue Clouds diffusion-models , protein-design 154 18 Python Apache License 2.0 2024-04-21 13:48:25 |
113 | DeepGraphLearning/ConfGF Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021). 153 34 10 Python MIT license |
114 | benevolentAI/DeeplyTough DeeplyTough: Learning Structural Comparison of Protein Binding Sites 3d-models , deep-learning , drug-discovery , metric-learning , protein-structure 151 39 Python Other 2023-04-07 09:33:44 |
115 | chao1224/GraphMVP Pre-training Molecular Graph Representation with 3D Geometry, ICLR'22 (https://openreview.net/forum?id=xQUe1pOKPam) contrastive-learning , generative-model , geometry , graph , molecule , pretraining , self-supervised , self-supervised-learning 150 20 5 Python MIT license 2022-09-20 14:29:48 |
116 | OpenGene/MutScan Detect and visualize target mutations by scanning FastQ files directly bioinformatics , cancer , detection , fastq , mutation , ngs , somatic , validation , variant , visualization 146 38 C MIT License 2022-02-10 01:52:44 |
117 | MolecularAI/ReinventCommunityastrazeneca , cheminformatics , denovo-design , jupyter-notebook , neural-networks , reinforcement-learning , transfer-learning 145 57 Jupyter Notebook MIT License 2022-04-22 16:44:35 |
117 | lh3/psmc Implementation of the Pairwise Sequentially Markovian Coalescent (PSMC) model bioinformatics , genomics , population-genetics 145 60 C Other 2022-11-21 04:39:31 |
117 | tencent-ailab/DrugOOD OOD Dataset Curator and Benchmark for AI-aided Drug Discovery 145 19 6 Python specific |
118 | ome/ome-zarr-py Implementation of next-generation file format (NGFF) specifications for storing bioimaging data in the cloud. ngff , ome , ome-zarr , zarr 143 51 Python Other 2024-06-06 12:51:57 |
119 | Novartis/tidymodules An Object-Oriented approach to Shiny modules communication , inheritance , oop , r , shiny , shiny-modules , tidy-operators 141 11 R Other 2023-02-23 15:04:31 |
120 | aws-samples/aws-genomics-workflows Genomics Workflows on AWS aws , batch , genomics , step-functions , workflows 140 106 19 Shell MIT-0 license 2022-03-30 21:38:09 |
121 | MolecularAI/deep-molecular-optimization Molecular optimization by capturing chemist’s intuition using the Seq2Seq with attention and the Transformer molecular-optimization , multi-property-optimization , seq2seq , transformer 139 36 Python Apache License 2.0 2023-03-16 07:05:06 |
122 | AstraZeneca/SubTab The official implementation of the paper, "SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning" contrastive-learning , multi-view-learning , representation-learning , self-supervised-learning , tabular-data 138 20 Python Apache License 2.0 2022-07-01 09:03:38 |
122 | johnsonandjohnson/Bodiless-JS Framework for building editable websites on the JAMStack 138 59 TypeScript Apache License 2.0 2024-01-24 03:00:32 |
123 | Benson-Genomics-Lab/TRF Tandem Repeats Finder: a program to analyze DNA sequences 137 24 C GNU Affero General Public License v3.0 2023-01-16 20:44:26 |
124 | lh3/pangene Constructing a pangenome gene graph bioinformatics , pangenome 136 7 C 2024-05-29 00:13:01 |
125 | owkin/HistoSSLscaling Code associated to the publication: Scaling self-supervised learning for histopathology with masked image modeling, A. Filiot et al., MedRxiv (2023). We publicly release Phikon 🚀 computational-pathology 135 11 Jupyter Notebook Other 2024-01-29 22:35:32 |
126 | AstraZeneca/awesome-shapley-value Reading list for "The Shapley Value in Machine Learning" (JCAI 2022) artificial-intelligence , data-science , deep-learning , explainability , explainable , explainable-ai , explainable-artificial-intelligence , explainable-ml , lime , machine-learning , owen-value , shap , shapley , shapley-additive-explanations , shapley-decomposition , shapley-q-value , shapley-value , xai 134 10 Apache License 2.0 2022-08-08 08:53:10 |
127 | lh3/bedtk A simple toolset for BED files (warning: CLI may change before bedtk becomes stable) bioinformatics 132 15 C MIT License 2024-05-28 21:48:28 |
128 | Bioconductor/Contributions Contribute Packages to Bioconductor bioconductor 131 33 2023-09-12 18:32:10 |
129 | Merck/BioPhi BioPhi is an open-source antibody design platform. It features methods for automated antibody humanization (Sapiens), humanness evaluation (OASis) and an interface for computer-assisted antibody sequence design. antibody , humanization , humanness , oasis , sapiens 129 44 Python MIT License 2024-06-03 07:17:18 |
129 | soedinglab/plass sensitive and precise assembly of short sequencing reads bioinformatics , metagenomics , metatranscriptomics , opensource , proteins , proteomics , sequence-assembler 129 14 C GNU General Public License v3.0 2024-04-16 20:44:12 |
130 | benevolentAI/guacamol_baselines Baselines models for GuacaMol benchmarks 128 33 Python MIT License 2024-02-16 09:40:42 |
131 | AstraZeneca-NGS/VarDictJava VarDict Java port 127 52 Java MIT License 2024-01-05 14:03:51 |
132 | lh3/ksw2 Global alignment and alignment extension bioinformatics , sequence-alignment 124 24 C Other 2023-06-27 17:21:12 |
132 | chao1224/ChatDrug LLM for Drug Editing, ICLR 2024 chatgpt , chatgpt3 , conversation , domain-feedback , drug , drug-discovery , drug-editing , editing , llm , molecule , motif , peptide , protein , retrieval , secondary-structure , small-molecule , structure 124 8 3 Python 2024-05-28 19:44:44 |
133 | rdkit/rdkit-js A powerful cheminformatics and molecule rendering toolbelt for JavaScript, powered by RDKit . cheminformatics , drug-discovery , javascript , molecule , molecule-viewer , molecule-visualization , node-js , npm , rdkit , react , wasm 123 35 Dockerfile BSD 3-Clause "New" or "Revised" License 2024-06-01 09:54:52 |
133 | blazerye/DrugAssist DrugAssist: A Large Language Model for Molecule Optimization ai-for-science , drug-discovery , instruction-datasets , instruction-tuning , large-language-models , molecule-generation , molecule-optimization 123 10 3 Python |
134 | bigdatagenomics/mango A scalable genome browser. Apache 2 licensed. 122 30 Scala Apache License 2.0 2022-12-02 22:21:57 |
135 | OpenGene/repaq A fast lossless FASTQ compressor with ultra-high compression ratio 120 20 C MIT License 2023-09-22 02:48:34 |
136 | Bioconductor/BiocStickers Stickers for some Bioconductor packages - feel free to contribute and/or modify. bioconductor , stickers 119 86 R Other 2024-05-10 05:58:21 |
136 | greenelab/pancancer Building classifiers using cancer transcriptomes across 33 different cancer-types analysis , cancer , classifier , gene-expression , machine-learning , methodology , pancancer , tcga , tool , transcriptome 119 58 10 Jupyter Notebook BSD-3-Clause license 2018-03-01 15:38:33 |
137 | Roche/BalancedLossNLP 118 23 Jupyter Notebook Other 2023-06-12 21:51:15 |
138 | Merck/deepbgc BGC Detection and Classification Using Deep Learning bidirectional-lstm , biosynthetic-gene-clusters , deep-learning , deepbgc , natural-products , pfam2vec , python , synthetic-biology 117 26 Jupyter Notebook MIT License 2023-11-11 12:48:56 |
138 | benevolentAI/MolBERT 117 35 Python MIT License 2021-06-06 10:28:35 |
139 | genentech/equifold Official code repository for EquiFold: Protein Structure Prediction with a Novel Coarse-Grained Structure Representation machine-learning , proteins , structural-biology , structure-prediction 116 15 Python Apache License 2.0 2023-01-08 19:51:30 |
140 | OpenGene/GeneFuse Gene fusion detection and visualization alk , bioinformatics , cancer , cosmic , eml4 , fusion , gene , ret , ros1 114 62 C MIT License 2022-02-21 08:07:06 |
141 | biosustain/cameo cameo - computer aided metabolic engineering & optimization 113 42 Python Apache License 2.0 2022-11-07 14:54:19 |
142 | EBI-Metagenomics/emg-viral-pipeline VIRify: detection of phages and eukaryotic viruses from metagenomic and metatranscriptomic assemblies cwl , nextflow , pipeline , viruses , workflow 109 13 Python Apache License 2.0 2024-05-08 20:10:03 |
142 | OpenGene/gencore Generate duplex/single consensus reads to reduce sequencing noises and remove duplications bioinformatics , consensus , deduplication , deep-sequencing , duplex , duplex-sequencing , duplication , ngs , sequencing , sequencing-error , sequencing-noise , somatic 109 32 C++ MIT License 2023-10-27 06:19:21 |
142 | OpenGene/fastv An ultra-fast tool for identification of SARS-CoV-2 and other microbes from sequencing data. This tool can be used to detect viral infectious diseases, like COVID-19. 2019-ncov , bioinformatics , coronavirus , covid , covid-19 , hcov , meta-genomics , microbial-sequences , mngs , ngs , sars-cov-2 , sequencing , viral , viral-infectious-diseases , virus , visualization 109 24 C++ MIT License 2023-10-27 06:16:38 |
143 | lh3/yak Yet another k-mer analyzer bioinformatics , k-mer 108 8 C MIT License 2024-04-01 21:39:44 |
143 | lh3/fermikit De novo assembly based variant calling pipeline for Illumina short reads bioinformatics , denovo-assembly , genomics , variant-calling 108 23 TeX Other 2020-11-30 22:57:56 |
144 | Merck/Halyard Halyard is an extremely horizontally scalable Triplestore with support for Named Graphs, designed for integration of extremely large Semantic Data Models, and for storage and SPARQL 1.1 querying of the whole Linked Data universe snapshots. 107 17 Java Apache License 2.0 2023-01-23 16:59:32 |
144 | ome/ngff Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud. bioimaging , cloud , data-science , file-formats , spec 107 38 Bikeshed Other 2024-06-02 06:26:47 |
144 | soedinglab/CCMpred Protein Residue-Residue Contacts from Correlated Mutations predicted quickly and accurately. 107 25 C GNU Affero General Public License v3.0 2023-11-08 07:51:35 |
145 | lh3/minimap This repo is DEPRECATED. Please use minimap2, the successor of minimap. 106 29 C MIT License 2017-09-20 14:15:02 |
146 | chao1224/Geom3D Geom3D: Geometric Modeling on 3D Structures, NeurIPS 2023 3d , 3d-structures , ai4science , biology , chemistry , crystals , drugs , equivariance , geometry , group , invariance , material , molecules , physics , proteins , symmetry 105 9 2 Python MIT license 2024-06-05 03:18:58 |
147 | phuse-org/phuse-scripts Delivery standard industry analyses, built upon CDISC standards for analysis data 104 88 SAS MIT License 2023-08-01 15:21:20 |
147 | chembl/FPSim2 Simple package for fast molecular similarity searches cheminformatics , chemistry , gpu , python , similarity-search 104 17 Python MIT License 2024-02-15 11:13:05 |
148 | bayer-science-for-a-better-life/Img2Mol 103 41 Jupyter Notebook Apache License 2.0 2023-03-24 18:07:41 |
149 | Biogen-Inc/tidyCDISC Demo the app here: https://bit.ly/tidyCDISC_app pharma , r , rinpharma , rstats 102 38 R GNU Affero General Public License v3.0 2023-09-22 15:18:20 |
150 | openpharma/mmrm Mixed Models for Repeated Measures (MMRM) in R. 100 17 R Other 2024-06-03 18:02:15 |
150 | MolecularAI/DockStream DockStream: A Docking Wrapper to Enhance De Novo Molecular Design astrazeneca , chemoinformatics , denovo-design , jupyter-notebook , molecular-docking , reinforcement-learning 100 30 Python Apache License 2.0 2023-03-16 07:07:10 |
150 | Bayer-Group/paquo PAthological QUpath Obsession - QuPath and Python conversations digital-pathology , python , qupath 100 16 Python GNU General Public License v3.0 2024-06-02 18:21:27 |
151 | genentech/gReLU gReLU is a python library to train, interpret, and apply deep learning models to DNA sequences. 99 5 Python MIT License 2024-06-07 20:29:13 |
152 | lh3/hickit TAD calling, phase imputation, 3D modeling and more for diploid single-cell Hi-C (Dip-C) and general Hi-C bioinformatics , genomics , hi-c 98 11 C 2021-02-04 01:47:43 |
153 | aqlaboratory/rgn2 97 28 Python 2023-11-28 17:16:23 |
154 | lh3/bgt Flexible genotype query among 30,000+ samples whole-genome bioinformatics , genomics 96 10 C MIT License 2019-09-04 19:43:27 |
154 | scverse/rapids_singlecell Rapids_singlecell: A GPU-accelerated tool for scRNA analysis. Offers seamless scverse compatibility for efficient single-cell data processing and analysis. anndata , bioinformatics , gpu , scverse , single-cell 96 18 Python MIT License 2024-06-03 18:07:06 |
154 | shenwei356/bio_scripts Practical, reusable scripts for bioinformatics bioinformatics , perl , python , reusable , script 96 65 Perl MIT License 2019-02-12 13:21:47 |
155 | EBISPOT/OLS Ontology Lookup Service from SPOT at EBI java , neo4j , obofoundry , owl , owl-api 95 40 JavaScript Apache License 2.0 2023-04-28 20:09:19 |
156 | Sanofi-Public/CodonBERT Repository for mRNA Paper and CodonBERT publication. 94 14 Python Other 2024-05-03 19:24:06 |
156 | OpenGene/scrnapip A Systematic and Dynamic Pipeline for Single-Cell RNA Sequencing Analysis 94 14 HTML 2023-10-16 01:24:06 |
157 | EBI-Metagenomics/genomes-catalogue-pipeline MGnify genome analysis pipeline 93 21 Python Other 2024-06-06 09:44:21 |
158 | samtools/tabix Note: tabix and bgzip binaries are now part of the HTSlib project. 92 40 C 2021-08-03 14:29:38 |
158 | shenwei356/BlackheartedHospital (forked from: open-power-workgroup/Hospital) 网传附莆田系医院名单,欢迎更新 92 15 2016-05-03 07:06:09 |
159 | AbSciBio/unlocking-de-novo-antibody-design 91 14 Other 2024-01-09 17:36:19 |
159 | schrodinger/gpusimilarity A Cuda/Thrust implementation of fingerprint similarity searching cheminformatics , chemistry , gpu , similarity-analysis 91 26 C++ BSD 3-Clause "New" or "Revised" License 2024-01-24 19:08:08 |
159 | lh3/dipcall Reference-based variant calling pipeline for a pair of phased haplotype assemblies 91 9 JavaScript MIT License 2021-06-06 20:36:10 |
160 | Bioconductor/CSAMA Course material for CSAMA: Statistical Data Analysis for Genome Scale Biology 89 45 HTML 2024-06-06 12:04:08 |
160 | AstraZeneca/onto_merger OntoMerger is an ontology alignment library for deduplicating knowledge graph nodes that represent the same domain. algorithm , alignment , biological-networks , biology , graph , kg , knowledge , knowledge-graph , mapping , ontology , ontology-alignment 89 5 HTML Apache License 2.0 2024-01-11 19:22:08 |
160 | hoelzer-lab/rnaflow A simple RNA-Seq differential gene expression pipeline using Nextflow 89 19 HTML GNU General Public License v3.0 2024-02-26 20:45:37 |
160 | shenwei356/perfect-bioinformatic-tools What should perfect bioinformatic tools be like? bioinformatics , cli , usability 89 1 Creative Commons Zero v1.0 Universal 2024-03-19 10:22:54 |
161 | Sanofi-IADC/whispr Open source event, comment and alert processing hub created by Sanofi IADC 88 8 TypeScript MIT License 2024-06-04 12:01:03 |
161 | calico/scBasset Sequence-based Modeling of single-cell ATAC-seq using Convolutional Neural Networks. 88 11 Jupyter Notebook Apache License 2.0 2024-02-08 19:20:16 |
161 | shenwei356/bio A lightweight and high-performance bioinformatics package in Golang bioinformatics , golang , minimizer , package , scaled-minhash , sequence , syncmer , taxdump , taxonomy 88 9 7 Go MIT license 2024-03-11 09:41:44 |
162 | owkin/HE2RNA_code Train a model to predict gene expression from histology slides. 87 39 Python GNU General Public License v3.0 2022-07-06 20:53:24 |
162 | scverse/pertpy Perturbation Analysis in the scverse ecosystem. perturbation , scverse , single-cell 87 19 Python MIT License 2024-06-08 08:07:34 |