Computational approaches for identification of active regulatory regions in plant genomes

Authors Affiliation(s)

Data Science and Informatics, E.I. DuPont Knowledge Center, ICICI Knowledge Park, Genome Valley, Hyderabad 500078, Telangana, INDIA

Can J Biotech, Volume 1, Special Issue-Supplement, Page 234, DOI: https://doi.org/10.24870/cjb.2017-a219

Presenting and *Corresponding author: sunil.kumar@pioneer.com

Abstract

Identifying mechanisms governing gene regulation is important for improving gene expression in plants. Past computational approaches for studying regulatory associations specially to identify protein-DNA interactions suffered due to unavailability of huge amount of genetic data required to identify such loci at a global level. However, recent advancement in genomic technologies, such as ChIP-seq, ATACseq and RNAseq is driving this discovery process with unprecedented pace. In the current work, we integrate multiple different omics data to build regulatory network in maize and rice genome. ATACseq, RNAseq, Bind-n-seq datasets were generated in-house and complemented with other data obtained from public resources such as DNAseI and H3K27ac for different developmental stages of rice. A robust computational pipeline was built to identify open chromatin regions using ATACseq data which was then associated with genome-wide binding site for specific transcription factors. The motifs for these transcription factors were discovered de novo using bind-n-seq method. The discovered motifs where used to predict the binding sites across genome. These putative binding sites falling in open chromatin regions were then analyzed in context of differentially expressed genes in various developmental stages. This integrative analysis has led us to identify various active regulatory regions in cis-promoter as well as distal enhancers.