FANDOM


DescriptionEdit

How does the GC content of the genome vary?

OutputEdit

The following image shows the GC content estimates along chromosome 6 (build 36 of the human genome).

Chr6


We can see that the GC content along this chromosome varies from about 34% up to 54%

Running the scriptEdit

> perl ./estimate_GC_in_bins.pl input_FASTA_file.fa bin_size_in_bp output_file

RequirementsEdit

This script requires Perl. It calculates the base composition using a FASTA file. FASTA files for the human genome can be found from the UCSC website: http://hgdownload.cse.ucsc.edu/downloads.html#human

CodeEdit

Script 1 - estimate_GC_in_bins.plEdit

#!/usr/bin/perl

$narg = $#ARGV+1;
if ($narg < 3)
{
	die("Require an input fasta file, a bin size (in bp), and an output file\n");
}
$fasta = $ARGV[0];
$bin_size = $ARGV[1];
$output = $ARGV[2];

use Switch;

open(IN, '<', $fasta) or die("Could not open FASTA file: $fasta \n");
open(OUT, '>', $output) or die("Could not open output file: $output \n");
print OUT "CHROM\tBINSTART\tBINEND\tN_GC\tN_ACGT\tGC_PRCT\n";
while($line = <IN>)
{
	chomp($line);
	if (substr($line, 0, 1) eq ">")
	{
		$chr = substr($line, 1);
		$bin_start = 0;
		$bin_end = $bin_start + $bin_size;
		$bin_GC = 0;
		$bin_nonN = 0;
		$pos = 0;
		print "$chr\n";
		next;
	}
	$line = uc($line);

	if ($pos + length($line) + 1 < $bin_end)
	{
		$count = $line =~ tr/ACGT//;
		$bin_nonN += $count;
		$count = $line =~ tr/CG//;
		$bin_GC += $count;
		$pos += length($line);
	}
	else
	{
		@bases = split(, $line);
		for ($i=0; $i<=$#bases; $i++)
		{
			switch ($bases[$i])
			{
				case [A,T] { $bin_nonN++; }
				case [G,C] { $bin_nonN++; $bin_GC++; }
			}
			$pos++;

			if ($pos == $bin_end)
			{
				$GC = "NaN";
				if ($bin_nonN != 0)
				{
					$GC = 100.0 * $bin_GC / $bin_nonN;
				}
				print OUT "$chr\t$bin_start\t$bin_end\t$bin_GC\t$bin_nonN\t$GC\n";
				$bin_start += $bin_size;
				$bin_end += $bin_size;
				$bin_GC = 0; $bin_nonN = 0;
			}
		}
	}
}
close(OUT);
close(IN);

Ad blocker interference detected!


Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.