-
Notifications
You must be signed in to change notification settings - Fork 1
/
VizGenomicsData.Rpres
1604 lines (1047 loc) · 57.1 KB
/
VizGenomicsData.Rpres
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Visualising Genomics Data
========================================================
author: MRC Clinical Sciences Centre
date: http://mrccsc.github.io/training.html
autosize: true
author: "MRC CSC Bioinformatics Core Team"
date:http://mrccsc.github.io/training.html
width: 1440
height: 1100
autosize: true
font-import: <link href='http://fonts.googleapis.com/css?family=Slabo+27px' rel='stylesheet' type='text/css'>
font-family: 'Slabo 27px', serif;
css:style.css
```{r setup, include=FALSE}
knitr::opts_chunk$set(cache=TRUE)
```
The Course
========================================================
* The Course
* [Importance of Visualising Genomics Data](#/vizdata).
* [Reminder of file types](#/filetypes)
* [Reminder of data types](#/datatypes)
* [Materials](#/materials)
* [Visualising genomics data in R](#/VizinR)
* [Plotting genome axis](#/genomeaxis)
* [Plotting genome data](#/datatracks)
* [Plotting genome annotation](#/Annotation)
* [Plotting genome sequence](#/seqtrack)
* [Plotting genomic alignments](#/genomeaxis)
* [Plotting from external databases](#/externaldata)
Importance of Visualising Genomics Data.
========================================================
id: vizdata
It is an essential step in genomics data analysis to visualise your data. This allows you to review data for both known or unexpected data characteristics and potential artefacts.
While we have discussed using IGV to review genomics data, now we will discuss how to do this while still working with in the R.
Visualising Genomics Data in R/Bioconductor.
========================================================
id: vizdataR
In complement to our [IGV genome browser course](http://mrccsc.github.io/IGV_course/) where we reviewed visualising genomics data in a browser, here we will use R/Bioconductor to produce publication quality graphics programatically.
Much of the material will require some familiarity with R and Bioconductor [(you can revisit our courses on those here)](http://mrccsc.github.io/) and these will be used in tight conjunction with tools introduced today such as the Bioconductor package, **Gviz**.
Reminder of file types
========================================================
id: filetypes
In this session we will be dealing with a range of data types. For more information on file types you can revisit our material.
* [File Formats](http://mrccsc.github.io/genomicFormats.html).
For more information on visualising genomics data in browsers you can visit our IGV course.
* [IGV](http://mrccsc.github.io/IGV_course/).
Reminder of data types in Bioconductor
========================================================
id: datatypes
We will also encounter and make use of many data structures and data types which we have seen throughout our courses on HTS data. You can revisit this material to refresh on HTS data analysis in Bioconductor and R below.
* [Bioconductor](http://mrccsc.github.io/Bioconductor/).
* [Alignments](https://mrccsc.github.io/Alignment/).
* [ChIP-seq](http://mrccsc.github.io/ChIPseq_short/).
* [RNA-seq](http://mrccsc.github.io/RNAseq_short/).
Materials.
========================================================
id: materials
All material for this course can be found on github.
* [Visualising Genomics Data](https://github.com/mrccsc/VisualisingGenomicsData)
Or can be downloaded as a zip archive from here.
* [Download zip](https://github.com/mrccsc/VisualisingGenomicsData/archive/master.zip)
Materials. - Presentations, source code and practicals.
========================================================
Once the zip file in unarchived. All presentations as HTML slides and pages, their R code and HTML practical sheets will be available in the directories underneath.
* **presentations/**
Presentations as an HTML slide show.
* **presentations/exercises/**
Some tasks/examples to work through.
Materials. - Data for presentations, practicals.
========================================================
All data to run code in the presentations and in the practicals is available in the zip archive. This includes coverage as bigWig files, aligned reads as BAM files and genomic intervals stored as BED files.
All data can be found under the **Data** directory
**Data/**
We also include some RData files containing precompiled results from querying database (in case of external server downtime). All RData files can be found in the RData directory
**RData/**
Set the Working directory
========================================================
Before running any of the code in the practicals or slides we need to set the working directory to the folder we unarchived.
You may navigate to the unarchived VisualisingGenomicsData folder in the Rstudio menu
**Session -> Set Working Directory -> Choose Directory**
or in the console.
```{r,eval=F}
setwd("/PathToMyDownload/VisualisingGenomicsData/")
# e.g. setwd("~/Downloads/VisualisingGenomicsData")
```
Why are we here?
========================================================
Genomics data can often be visualised in genome browsers such as the user friendly IGV genome browser.
This allows for the visualisation of our processed data in its genomic context.
In Genomics *(and most likely any Omics)*, it is important to review our data/results and hypotheses in a browser to identify patterns or potential artefacts discovered or missed within our analysis.
But we covered this??
========================================================
We have already discussed on using the IGV browser to review our data and get access to online data repositories.
IGV is quick, user friendly GUI to perform the essential task of review genomics data in its context.
For more information see our course on IGV [here](http://mrccsc.github.io/IGV_course/).
Then why not just use IGV?
========================================================
Using a genome browser to review sites of interest across the genome is a critical **first** step.
**Using processed and often indexed genomics data files**, IGV offers a method to rapidly interrogate genomics data along the linear genome.
**IGV does its job well** and should always be an immediate early step in data review. By being good at this however it **does not offer the flexibility** in displaying data we wish to achieve, more so **when expecting to review a large number of sites**.
Visualising Genomics Data around Genomic Features in R (Gviz)
========================================================
id:VizinR
The Gviz packages offers methods to produce publication quality plots of genomics data at genomic features of interest.
To get started using Gviz in some biological examples, first we need to install the package.
```{r, echo=T,eval=F}
## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("Gviz")
```
Getting started with Gviz -- Linear genome axis.
========================================================
id: genomeaxis
Gviz provides methods to plot many genomics data types (as with IGV) over genomic features and genomic annotation within a linear genomic reference.
The first thing we can do then is set up our linear axis representing positions on genomes.
For this we use our first function from **Gviz**, **GenomeAxisTrack()**.
Here we use the **name** parameter to set the name to be "myAxis".
```{r, echo=T}
library(Gviz)
genomeAxis <- GenomeAxisTrack(name="MyAxis")
genomeAxis
```
Getting started with Gviz -- Plotting the axis
========================================================
Now we have created a **GenomeAxisTrack** track object we can display the object using **plotTracks** function.
In order to display a axis track we need to set the limits of the plot *(otherwise where would it start and end?)*.
```{r, echo=T,eval=F,fig.width=23,fig.height=3}
plotTracks(genomeAxis,from=100,to=10100)
```
```{r, echo=F,fig.width=23,fig.height=3}
plotTracks(genomeAxis,from=100,to=10100,cex=3)
```
Getting started with Gviz -- Configuring the axis (part-1)
========================================================
It is fairly straightforward to create and render this axis.
Gviz offers a high degree of flexibility in the way these tracks can be plotted with some very useful plotting configurations included.
A useful feature is to add some information on the direction of the linear genome represented in this **GenomeAxisTrack**.
We can add labels for the 5' to 3' direction for the positive and negative strand by using the **add53** and **add35** parameters.
```{r, echo=T,eval=F,fig.width=23,fig.height=3}
plotTracks(genomeAxis,from=100,to=10100,
add53=T,add35=T)
```
```{r, echo=F,fig.width=23,fig.height=3}
plotTracks(genomeAxis,from=100,to=10100,
add53=T,add35=T,cex=3)
```
Getting started with Gviz -- Configuring the axis (part-2)
========================================================
We can also configure the resolution of the axis (albeit rather bluntly) using the **littleTicks** parameter.
This will add additional axis tick marks between those shown by default.
```{r, echo=T,eval=F,fig.width=23,fig.height=3}
plotTracks(genomeAxis,from=100,to=10100,
littleTicks = TRUE)
```
```{r, echo=F,fig.width=23,fig.height=3}
plotTracks(genomeAxis,from=100,to=10100,
littleTicks = TRUE,cex=3)
```
Getting started with Gviz -- Configuring the axis (part-3)
========================================================
By default the plot labels for the genome axis track are alternating below and above the line.
We can further configure the axis labels using the **labelPos** parameter.
Here we set the labelPos to be always below the axis
```{r, echo=T,eval=F,fig.width=23,fig.height=3}
plotTracks(genomeAxis,from=100,to=10100,
labelPos="below")
```
```{r, echo=F,fig.width=23,fig.height=3}
plotTracks(genomeAxis,from=100,to=10100,
labelPos="below",cex=3)
```
Getting started with Gviz -- Configuring the axis (part-4)
========================================================
In the previous plots we have produced a genomic axis which allows us to consider the position of the features within the linear genome.
In some contexts we may be more interested in relative distances around and between the genomic features being displayed.
We can configure the axis track to give us a relative representative of distance using the **scale** parameter.
```{r, echo=T,eval=F,fig.width=10,fig.height=5}
plotTracks(genomeAxis,from=100,to=10100,
scale=1,labelPos="below")
```
```{r, echo=F,fig.width=23,fig.height=3}
plotTracks(genomeAxis,from=100,to=10100,
scale=1,labelPos="below",cex=3)
```
Getting started with Gviz -- Configuring the axis (part-4b)
========================================================
We may want to add only a part of the scale (such as with Google Maps) to allow the reviewer to get a sense of distance.
We can specify how much of the total axis we wish to display as a scale using a value of 0 to 1 representing the proportion of scale to show.
```{r, echo=T,eval=F,fig.width=10,fig.height=5}
plotTracks(genomeAxis,from=100,to=10100,
scale=0.3)
```
```{r, echo=F,fig.width=23,fig.height=3}
plotTracks(genomeAxis,from=100,to=10100,
scale=0.3,cex=3)
```
Getting started with Gviz -- Configuring the axis (part-4c)
========================================================
We can also provide numbers greater than 1 to the **scale** parameter which will determine, in absolute base pairs, the size of scale to display.
```{r, echo=T,eval=F,fig.width=10,fig.height=5}
plotTracks(genomeAxis,from=100,to=10100,
scale=2500)
```
```{r, echo=F,fig.width=23,fig.height=3}
plotTracks(genomeAxis,from=100,to=10100,
scale=2500,cex=3)
```
Getting started with Gviz -- Axis and Regions of Interest (part-1)
========================================================
Previously we have seen how to highlight regions of interest in the scale bar for IGV.
These "regions of interest" may be user defined locations which add context to the scale and the genomics data to be displayed (e.g. Domain boundaries such as topilogically associated domains)
![ROI](imgs/igv_BookMarks.png)
Getting started with Gviz -- Axis and Regions of Interest (part-2)
========================================================
We can add "regions of interest" to the axis plotted by Gviz as we have done with IGV.
To do this we will need to define some ranges to signify the positions of "regions of interest" in the linear context of our genome track.
Since the plots have no apparent context for chromosomes (yet), we will use a IRanges object to specify "regions of interest" as opposed to the genome focused GRanges.
You can see our material [here](http://mrccsc.github.io/Bioconductor/) on Bioconductor objects for more information on IRanges and GRanges.
Brief recap (Creating an IRanges)
========================================================
To create an IRanges object we will load the IRanges library and specify vectors of **start** and **end** parameters to the **IRanges** constructor function.
```{r, echo=T,fig.width=10,fig.height=5}
library(IRanges)
regionsOfInterest <- IRanges(start=c(140,5140),end=c(2540,7540))
names(regionsOfInterest) <- c("ROI_1","ROI_2")
regionsOfInterest
```
Getting started with Gviz -- Axis and Regions of Interest (part-3)
========================================================
Now we have our IRanges object representing our regions of interest we can include them in our axis.
We will have to recreate our axis track to allow us to include these regions of interest.
Once we have updated our GenomeAxisTrack object we can plot the axis with regions of interest included.
```{r, echo=T,eval=F,fig.width=10,fig.height=5}
genomeAxis <- GenomeAxisTrack(name="MyAxis",
range = regionsOfInterest)
plotTracks(genomeAxis,from=100,to=10100)
```
```{r, echo=F,fig.width=23,fig.height=3}
genomeAxis <- GenomeAxisTrack(name="MyAxis",
range = regionsOfInterest)
plotTracks(genomeAxis,from=100,to=10100,cex=3)
```
Getting started with Gviz -- Axis and Regions of Interest (part-4)
========================================================
We include the names specified in the IRanges for the regions of interest within the axis plot by specify the **showID** parameter to TRUE.
```{r, echo=F,fig.width=23,fig.height=5}
plotTracks(genomeAxis,from=100,to=10100,
range=regionsOfInterest,
showId=T,cex=3,col.id="black")
```
```{r, echo=T,eval=F,fig.width=10,fig.height=5}
plotTracks(genomeAxis,from=100,to=10100,
range=regionsOfInterest,
showId=T)
```
Plotting regions in Gviz - Data tracks
========================================================
id:datatracks
Now we have some fine control of the axis, it follows that we may want some to display some actual data along side our axis and/or regions of interest.
Gviz contains a general container for data tracks which can be created using the **DataTrack()** constructor function and associated object, **DataTrack**.
Generally **DataTrack** may be used to display most data types with some work but best fits ranges with associated signal as a matrix (multiple regions) or vector (single sample).
Lets update our IRanges object to have some score columns in the metadata columns. We can do this with the **mcols** function as shown in our Bioconductor material.
```{r, echo=T,fig.width=10,fig.height=5}
mcols(regionsOfInterest) <- data.frame(Sample1=c(30,20),Sample2=c(20,200))
regionsOfInterest <- GRanges(seqnames="chr5",ranges = regionsOfInterest)
regionsOfInterest
```
Plotting regions in Gviz - Data tracks
========================================================
Now we have the data we need, we can create a simple **DataTrack** object.
```{r, echo=T,eval=F,fig.width=10,fig.height=5}
dataROI <- DataTrack(regionsOfInterest)
plotTracks(dataROI)
```
```{r, echo=F,fig.width=23,fig.height=5}
dataROI <- DataTrack(regionsOfInterest)
plotTracks(dataROI,cex=3)
```
Plotting regions in Gviz - Data tracks
========================================================
As we have seen, **DataTrack** objects make use of IRanges/GRanges which are the central workhorse of Bioconductors HTS tools.
This means we can take advantage of the many manipulations available in the Bioconductor tool set.
Lets make use of rtracklayer's importing tools to retrieve coverage from a bigWig as a GRanges object
```{r, echo=T,fig.width=10,fig.height=5}
library(rtracklayer)
allChromosomeCoverage <- import.bw("Data/small_Sorted_SRR568129.bw",as="GRanges")
allChromosomeCoverage
```
Plotting regions in Gviz - Data tracks (part 4)
========================================================
Now we have our coverage as a GRanges object we can create our **DataTrack** object from this.
Notice we specify the chromsome of interest in the **chromosome** parameter.
```{r, echo=T,fig.width=10,fig.height=5}
accDT <- DataTrack(allChromosomeCoverage,chomosome="chr5")
accDT
```
Plotting regions in Gviz - Data tracks (part 5)
========================================================
To plot data now using the plotTracks() function we will set the regions we wish to plot by specifying the chromsomes, start and end using the **chromosome**, **from** and **to** parameters.
By default we will get a similar point plot to that seen before.
```{r, echo=T,fig.width=23,fig.height=5}
plotTracks(accDT,
from=134887451,to=134888111,
chromosome="chr5")
```
Plotting regions in Gviz - Data tracks (part 6)
========================================================
We can adjust the type of plots we want using the **type** argument.
Here as with standard plotting we can specify **"l"** to get a line plot.
```{r, echo=T,fig.width=23,fig.height=5}
plotTracks(accDT,
from=134887451,to=134888111,
chromosome="chr5",type="l")
```
Plotting regions in Gviz - Data tracks (part 6)
========================================================
Many other types of plots are available for the DataTracks.
Including smoothed plots using "smooth".
```{r, echo=T,fig.width=23,fig.height=5}
plotTracks(accDT,
from=134887451,to=134888111,
chromosome="chr5",type="smooth")
```
Plotting regions in Gviz - Data tracks (part 7)
========================================================
Histograms by specifying "h".
```{r, echo=T,fig.width=23,fig.height=5}
plotTracks(accDT,
from=134887451,to=134888111,
chromosome="chr5",type="h")
```
Plotting regions in Gviz - Data tracks (part 8)
========================================================
Or filled/smoothed plots using "mountain".
```{r, echo=T,fig.width=23,fig.height=5}
plotTracks(accDT,
from=134887451,to=134888111,
chromosome="chr5",type="mountain")
```
Plotting regions in Gviz - Data tracks (part 9)
========================================================
and even a Heatmap using "heatmap".
Notice that Gviz will automatically produce the appropriate Heatmap scale.
```{r, echo=T,fig.width=23,fig.height=5}
plotTracks(accDT,
from=134887451,to=134888111,
chromosome="chr5",type="heatmap")
```
Plotting regions in Gviz - Additional Parameters.
========================================================
As with all plotting functions in R, Gviz plots are highly customisable.
Simple features such as point size and colour are easily set as for standard R plots using **cex** and **col** paramters.
```{r, echo=T,fig.width=23,fig.height=5}
plotTracks(accDT,
from=134887451,to=134888111,
chromosome="chr5",
col="red",cex=4)
```
Putting track togethers - Axis and Data
========================================================
Now we have shown how to construct a data track and axis track we can put them together in one plot.
To do this we simply provide the GenomeAxisTrack and DataTrack objects as vector the **plotTracks()** function.
```{r, echo=T,fig.width=25,fig.height=5}
plotTracks(c(accDT,genomeAxis),
from=134887451,to=134888111,
chromosome="chr5"
)
```
Putting track togethers - Ordering tracks in plot
========================================================
The order of tracks in the plot is simply defined by the order they are placed in the vector passed to **plotTracks()**
```{r, echo=T,fig.width=25,fig.height=5}
plotTracks(c(genomeAxis,accDT),
from=134887451,to=134888111,
chromosome="chr5"
)
```
Putting track togethers - Controling height of tracks in plot
========================================================
By default, Gviz will try and provide sensible track heights for your plots to best display your data.
The track height can be controlled by providing a vector of relative heights to the **sizes** paramter of the **plotTracks()** function.
If we want the axis to be 50% of the height of the Data track we specify the size for axis as 0.5 and that of data as 1.
The order of sizes must match the order of objects they relate to.
```{r, echo=T,fig.width=25,fig.height=5}
plotTracks(c(genomeAxis,accDT),
from=134887451,to=134888111,
chromosome="chr5",
sizes=c(0.5,1)
)
```
Exercises
========================================================
Time for exercises! [Link here](https://mrccsc.github.io/VisualisingGenomicsData/exercises/AxisAndDataTrack_Exercises.html)
Solutions
========================================================
Time for solutions! [Link here](https://mrccsc.github.io/VisualisingGenomicsData/solutions/AxisAndDataTrack_Solutions.html)
Adding annotation to plots.
========================================================
id:Annotation
Genomic annotation, such as Gene/Transcript models, play an important part of visualising genomics data in context.
Gviz provides many routes for constructing genomic annotation using the **AnnotationTrack()** constructor function. In contrast to the **DataTrack**, **AnnotationTrack** allows for the specification of feature groups.
First lets create a GRanges object with some more regions
```{r, echo=T,fig.width=10,fig.height=5}
toGroup <- GRanges(seqnames="chr5",
IRanges(
start=c(10,500,550,2000,2500),
end=c(300,800,850,2300,2800)
))
names(toGroup) <- seq(1,5)
toGroup
```
Adding annotation to plots. Grouping (part-1)
========================================================
Now we can create the **AnnotationTrack** object using the constructor.
Here we also provide a grouping to the **group** parameter in the **AnnotationTrack()** function.
```{r, echo=T,fig.width=23,fig.height=5}
annoT <- AnnotationTrack(toGroup,
group = c("Ann1",
"Ann1",
"Ann2",
"Ann3",
"Ann3"))
plotTracks(annoT)
```
Adding annotation to plots.
========================================================
We can see the features are displayed grouped by lines.
But if we want to see the names we must specify the group parameter by using the **groupAnnotation** argument.
```{r, echo=T,fig.width=23,fig.height=5}
plotTracks(annoT,groupAnnotation = "group")
```
Adding annotation to plots. Strands and direction.
========================================================
When we created the GRanges used here we did not specify any strand information.
```{r, echo=T,fig.width=10,fig.height=5}
strand(toGroup)
```
When plotting annotation without strand a box is used to display features as seen in previous slides
Adding annotation to plots. Strands and direction (part-2).
========================================================
Now we can specify some strand information for the GRanges and replot.
Arrows now indicate the strand which the features are on.
```{r, echo=T,fig.width=23,fig.height=5}
strand(toGroup) <- c("+","+","*","-","-")
annoT <- AnnotationTrack(toGroup,
group = c("Ann1",
"Ann1",
"Ann2",
"Ann3",
"Ann3"))
plotTracks(annoT, groupingAnnotation="group")
```
Adding annotation to plots. Controlling the display density
========================================================
In the IGV course we saw how you could control the display density of certain tracks.
Annotation tracks are often stored in files such as the general feature format (see our previous course).
IGV allows us to control the density of these tracks in the view options by setting to "collapsed", "expanded" or "squished".
Whereas "squished" and "expanded" maintains much of the information within the tracks, "collapsed" flattens overlapping features into a single displayed feature.
```{r, echo=T,fig.width=10,fig.height=5}
```
Adding annotation to plots. Controlling the display density (part 2)
========================================================
Here we have the same control over the display density of our annotation tracks.
By default the tracks are stacked using the "squish" option to make best use of the available space.
```{r, echo=F,fig.width=25,fig.height=5}
toGroup <- GRanges(seqnames="chr5",
IRanges(
start=c(100,100,500,700,2000,2500),
end=c(300,300,800,1050,2300,2800)
))
names(toGroup) <- seq(1,6)
#toGroup
strand(toGroup) <- c("*","*","*","*","*","*")
annoT <- AnnotationTrack(toGroup,
group = c("Ann1",
"Ann2",
"Ann1",
"Ann2",
"Ann3",
"Ann3"))
```
```{r, echo=T,fig.width=25,fig.height=5}
plotTracks(annoT, groupingAnnotation="group",stacking="squish")
```
Adding annotation to plots. Controlling the display density (part 3)
========================================================
By setting the **stacking** parameter to "dense", all overlapping features have been collapsed/flattened
```{r, echo=T,fig.width=25,fig.height=5}
plotTracks(annoT, groupingAnnotation="group",stacking="dense")
```
Adding annotation to plots. Feature types.
========================================================
**AnnotationTrack** objects may also hold information on feature types.
For gene models we may be use to feature types such as mRNA, rRNA, snoRNA etc.
Here we can make use of feature types as well.
We can set any feature types within our data using the **feature()** function. Here they are unset so displayed as unknown.
```{r, echo=T,fig.width=10,fig.height=5}
feature(annoT)
```
Adding annotation to plots. Setting feature types.
========================================================
We can set our own feature types for the **AnnotationTrack** object using the same **feature()** function.
We can choose any feature types we wish to define.
```{r, echo=T,fig.width=10,fig.height=5}
feature(annoT) <- c(rep("Good",4),rep("Bad",2))
feature(annoT)
```
Adding annotation to plots. Display feature types.
========================================================
Now we have defined our feature types we can use this information within our plots.
In GViz, we can directly specify attributes for individual feature types within our AnnotationTrack, in this example we add attributes for colour to be displayed.
We specify the "Good" features as blue and the "Bad" features as red.
```{r, echo=T,fig.width=25,fig.height=5}
plotTracks(annoT, featureAnnotation = "feature",
groupAnnotation = "group",
Good="Blue",Bad="Red")
```
GeneRegionTrack
========================================================
id:grtrack
We have seen how we can display complex annotation using the **AnnotationTrack** objects.
For gene models Gviz contains a more specialised object, the **GeneRegionTrack** object.
The **GeneRegionTrack** object contains additional parameters and display options specific for the display of gene models.
Lets start by looking at the small gene model set stored in the **Gviz** package.
```{r, echo=T,fig.width=10,fig.height=5}
data(geneModels)
head(geneModels)
```
GeneRegionTrack
========================================================
```{r, echo=F,fig.width=10,fig.height=5}
data(geneModels)
head(geneModels)
```
We can see that this data.frame contains information on start, end , chromosome and strand of feature needed to position features in a linear genome.
Also included are a feature type column named "feature" and columns containing additional metadata to group by - "gene","exon","transcript","symbol".
GeneRegionTrack - Setting up the gene model track.
========================================================
We can define a GeneRegionTrack as we would all other tracktypes. Here we provide a genome name, chromosome of interest and a name for the track.
```{r, echo=T,fig.width=23,fig.height=5}
grtrack <- GeneRegionTrack(geneModels, genome = "hg19",
chromosome = "chr7",
name = "smallRegions")
plotTracks(grtrack)
```
GeneRegionTrack - Setting up the gene model track.
========================================================
```{r, echo=T,fig.width=23,fig.height=5}
plotTracks(grtrack)
```
We can see that features here are rendered slightly differently to those in an **AnnotationTrack** object.
Here direction is illustrated by arrows in introns and unstranslated regions are shown as narrower boxes.
GeneRegionTrack - Specialised labelling.
========================================================
Since gene models typically contain exon, transcript and gene level annotation we can specify how we wish to annotate our plots by using the **transcriptAnnotation** and **exonAnnotation** parameters.
To label all transcripts by the gene annotation we specify the gene column to the **transcriptAnnotation** parameter.
```{r, echo=T,fig.width=23,fig.height=5}
plotTracks(grtrack,transcriptAnnotation="gene")
```
GeneRegionTrack - Specialised labelling.
========================================================
Similarly we can label transcripts by their individual transcript names.
```{r, echo=T,fig.width=23,fig.height=8}
plotTracks(grtrack,transcriptAnnotation="transcript")
```
GeneRegionTrack - Specialised labelling.
========================================================
Or we can label using the **transcriptAnnotation** object by any arbitary column where there is one level per transcript.
```{r, echo=T,fig.width=15,fig.height=5}
plotTracks(grtrack,transcriptAnnotation="symbol")
```
GeneRegionTrack - Specialised labelling of exons.
========================================================
As with transcripts we can label individual features using the **exonAnnotation** parameter by any arbitary column where there is one level per feature/exon.
```{r, echo=T,fig.width=20,fig.height=8}
plotTracks(grtrack,exonAnnotation="exon",from=26677490,to=26686889,cex=0.5)
```
GeneRegionTrack - Specialized display density for gene models.
========================================================
We saw that we can control the display density when plotting **AnnotationTrack** objects.
We can control the display density of GeneRegionTracks in the same manner.
```{r, echo=T,fig.width=20,fig.height=8}
plotTracks(grtrack, stacking="dense")
```
GeneRegionTrack - Specialized display density for gene models.
========================================================
However, since the **GeneRegionTrack** object is a special class of the **AnnotationTrack** object we have special parameter for dealing with display density of transcripts.
The **collapseTranscripts** parameter allows us a finer degree of control than that seen with **stacking** parameter.
Here we set **collapseTranscripts** to be true inorder to merge all overlapping transcripts.
```{r, echo=T,fig.width=15,fig.height=5}
plotTracks(grtrack, collapseTranscripts=T,
transcriptAnnotation = "symbol")
```
GeneRegionTrack - Specialized display density for gene models.
========================================================
Collapsing using the **collapseTranscripts** has summarised our transcripts into their respective gene boundaries.
We have however lost information on the strand of transcripts. To retain this information we need to specify a new shape for our plots using the **shape** parameter. To capture direction we use the "arrow" shape
```{r, echo=T,fig.width=15,fig.height=5}
plotTracks(grtrack, collapseTranscripts=T,
transcriptAnnotation = "symbol",
shape="arrow")
```
GeneRegionTrack - Specialized display density for gene models.
========================================================
The **collapseTranscripts** function also allows us some additional options by which to collapse our transcripts.
These methods maintain the intron information in the gene model and so get us closer to reproducing the "collapsed" feature in IGV.
Here we may collapse transcripts to the "longest".
```{r, echo=T,fig.width=15,fig.height=5}
plotTracks(grtrack, collapseTranscripts="longest",
transcriptAnnotation = "symbol")
```
GeneRegionTrack - Specialized display density for gene models.
========================================================
Or we may specify to **collapseTranscripts** function to collapse by "meta".
The "meta" option shows us a composite, lossless illustration of the gene models closest to that seen in "collapsed" IGV tracks.
Here importantly all exon information is retained.
```{r, echo=T,fig.width=15,fig.height=5}
plotTracks(grtrack, collapseTranscripts="meta",
transcriptAnnotation = "symbol")
```
GeneRegionTrack - Building your own gene models.
========================================================
We have seen in previous material how gene models are organised in Bioconductor using the **TxDB** objects.
Gviz may be used in junction with **TxDB** objects to construct the **GeneRegionTrack** objects.
We saw in the Bioconductor and ChIPseq course that many genomes have pre-build gene annotation within the respective TxDB libraries. Here we will load a **TxDb** for hg19 from the **TxDb.Hsapiens.UCSC.hg19.knownGene** library.
```{r, echo=TRUE}
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
txdb
```
GeneRegionTrack - Building your own gene models from a TxDb.
========================================================
Now we have loaded our **TxDb** object and assigned it to *txdb*. We can use this **TxDb** object to construct our **GeneRegionTrack**. Here we focus on chromosome 7 again.
```{r, echo=TRUE}
customFromTxDb <- GeneRegionTrack(txdb,chromosome="chr7")
head(customFromTxDb)
```
GeneRegionTrack - Building your own gene models from a TxDb.
========================================================