Skip to content

Anube9/Computational-Analysis-of-argR-Motifs-using-R

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Computational Analysis of argR Motifs using R

##This computational analysis aims to analyze DNA sequences and identify potential binding sites for the transcription factor argR. The script calculates weight matrices based on provided count data and then scans upstream regulatory regions (provided in the ECOLI file in file attachments) to identify the top-scoring regions with the highest similarity to the weight matrix, suggesting potential argR binding sites

1. Importing Data and Libraries:

Load counts matrix: Read the argR counts matrix from a text file and structure it as a data frame with rows as bases (A, C, G, T) and columns as positions in the motif.
Install Biostrings: Install the Biostrings library for handling biological sequences.
Load upstream regulatory regions: Read the regulatory sequence file containing upstream regulatory regions into a data frame.

2. Calculating Frequency and Weight Matrices:

Frequency matrix: Calculate the frequency matrix from the counts matrix by dividing each count by the total count in its column.
Pseudocounts: Add 1 to each count in the matrix to avoid zero probabilities, creating a pseudocounts matrix.
Weight matrix: Calculate the weight matrix using log-odds, comparing the pseudocounts frequencies to background frequencies (0.25 for each base).

3. Scanning Upstream Regions for Binding Sites:

Define motif length: Determine the length of the motif from the number of columns in the weight matrix.
Loop through sequences: For each gene ID and sequence in the upstream regions data frame:
Extract subsequences: Extract all subsequences of the specified motif length from the sequence.
Calculate scores: For each subsequence, calculate a score by summing the corresponding values from the weight matrix.
Find maximum score: Record the maximum score for that gene ID.
Sort scores: Sort all gene IDs by their maximum scores in descending order.
Printing top 30 Gene IDs: Displaying the top 30 gene IDs with the highest scores, suggesting potential argR binding sites.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages