Skip to content

Latest commit





Folders and files

Last commit message
Last commit date

parent directory


Generating the refgen and the annotations

script to generate the final gff3 and the seqinf

The genome assembly was generated as explained in Theulot et al., 2024. The original GFF file was produced by LRSDAY (v1.7).

Gene to name correspondence table was created from S288C_reference_genome_R64-3-1_20210421.gff file downloaded from SGD.

One of the X element mapped by LRSDAY on chrV was annotated as X_element_partial and we manually change this annotation to X_element in order to have at least one X element by telomere.

Telomeric repeat were annotated by telofinder.

ARS annotation was performed as the following:

  • ARS positions were downloaded from OriDB on December 2023 and renamed so that each ARS had a unique name.
  • corresponding DNA sequence were extracted from the sacCer1 reference genome used for OriDB.
  • the DNA sequences were mapped on our reference genome using bwa mem (v0.7.17-r1198-dirty) and the resulting bam file was converted to a bed file using bedtools bamtobed (v2.26.0).
  • ARS that map on a different chromosome between OriDB and our genome were discarded

rDNA sequences were extracted from the S288C reference genome (R64-3-1) and mapped on the BT1genome using bwa mem (v0.7.17-r1198-dirty) and the resulting bam file was converted to a bed file using bedtools bamtobed (v2.26.0).

pBL-hsvTKco-hENT1co sequences were mapped on the BT1genome using bwa mem (v0.7.17-r1198-dirty) and the resulting bam file was converted to a bed file using bedtools bamtobed (v2.26.0)).

All these features were combined in the BT1multiUra.gff3 file

The seqinfo file was generated using the following R code

BT1_genome <- readDNAStringSet("")  

seqinf <- Seqinfo(  
seqnames(seqinf)[17] <- "chrM"  
