Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Created page with 'Here I showed an example using Goncalo's library (I assume he agreed me to do so).<br> The purpose of this program is (1) to extract a range of text from a input file, and count…'
Here I showed an example using Goncalo's library (I assume he agreed me to do so).<br>

The purpose of this program is (1) to extract a range of text from a input file, and count there frequencies. (2) count which line have the same text as the first line<br>

The code will open a file (specified by -h parameter), take the third field, obtain a range of text (range is specified by starting position using -f and ending position using -t).<br>

<source lang="c">#include "StringArray.h"
#include "StringHash.h"
#include "Parameters.h"
#include "Error.h"

int main(int argc, char ** argv)
{
String filename;
int firstMarker = 0, lastMarker = 10;

ParameterList pl;

pl.Add(new StringParameter('h', "Haplotype File", filename));
pl.Add(new IntParameter('f', "From Position", firstMarker));
pl.Add(new IntParameter('t', "To Position", lastMarker));
pl.Read(argc, argv);
pl.Status();

IFILE input = ifopen(filename, "rt");

if (input == NULL)
error("Failed to open file %s", (const char *) filename);

String buffer;
StringArray tokens;
StringArray haplos, names;
StringIntHash haploCounts;

while (!ifeof(input))
{
buffer.ReadLine(input);
tokens.ReplaceTokens(buffer);

if (buffer.Length() > 0 && tokens.Length() != 3)
{
printf("Expect 3 words per line but the line beginning with \"%.10s\" looks different ...", (const char *) buffer);
continue;
}

haplos.Push(tokens[2].Mid(firstMarker, lastMarker));
names.Push(tokens[0]);

haploCounts.IncrementCount(tokens[2].Mid(firstMarker, lastMarker));

}

printf("Haplotype Counts\n");
for (int i = 0; i < haploCounts.Capacity(); i++)
if (haploCounts.SlotInUse(i))
printf("%s %d\n", (const char *) haploCounts[i], haploCounts.Integer(i));

printf("Haplotypes that match the first one\n");
for (int i = 1; i < haplos.Length(); i++)
if (haplos[i] == haplos[0])
printf("%s (%d)\n", (const char *) names[i], i + 1);
printf("\n");

ifclose(input);
}</source>

A example input, say INPUT.txt is like:<br>

WTCCC66061-&gt;WTCCC66061 HAPLO1 AGACTCTGATAGCGATAACC<br>WTCCC66061-&gt;WTCCC66061 HAPLO2 GGGTTCCGATGGCGATAACC<br>WTCCC66062-&gt;WTCCC66062 HAPLO1 AGACTCTGATGGCGCTAACC<br>WTCCC66062-&gt;WTCCC66062 HAPLO2 AGACTCTGATAGCGATGATC<br>WTCCC66063-&gt;WTCCC66063 HAPLO1 AGACTCTTATGGCGCTAGCC<br>WTCCC66063-&gt;WTCCC66063 HAPLO2 AGACTCTTATAGCGATAACC<br>WTCCC66064-&gt;WTCCC66064 HAPLO1 AGACTCTGATGGCGATAGCC<br>WTCCC66064-&gt;WTCCC66064 HAPLO2 AGACTCTGATGACGCTAGCC<br>WTCCC66065-&gt;WTCCC66065 HAPLO1 AGACTCTGATGGCGATAACC<br>WTCCC66065-&gt;WTCCC66065 HAPLO2 AGACTCTGATGGCGATAGCC<br>

And if we run "extractHaplo -h INPUT.txt -f 1 -t 3", and the ouput looks like<br>

The following parameters are in effect:<br> Haplotype File : INPUT.txt (-hname)<br> From Position : 1 (-f9999)<br> To Position : 3 (-t9999)

Haplotype Counts<br>GGT 1<br>GAC 9<br>60 1<br>Haplotypes that match the first one<br>WTCCC66062-&gt;WTCCC66062 (3)<br>WTCCC66062-&gt;WTCCC66062 (4)<br>WTCCC66063-&gt;WTCCC66063 (5)<br>WTCCC66063-&gt;WTCCC66063 (6)<br>WTCCC66064-&gt;WTCCC66064 (7)<br>WTCCC66064-&gt;WTCCC66064 (8)<br>WTCCC66065-&gt;WTCCC66065 (9)<br>WTCCC66065-&gt;WTCCC66065 (10)<br><br>

*To read a file, use IFILE class, which is wrapper for read/write file. A particular useful thing is that it handle gzipped file transparently. Important functions are: ifopen(), ifclose().<br>
*To handle strings, we prefer to use String class. In this area, handling string is a versatile task. A String class encapsulate basic operations such as index[], append(+), equality(=), extract(Left, Right, Mid). String class can be seamlessly used with IFILE class to acces file. See the while loop in the example code and notice the function ReadLine().<br>
*To tokenize a String class, we can use StringArray class. It has ReplaceToken() which will store each token field like an array.<br>
*To associate a String class to a integer type, there is a class named StringIntHash, important functions are IncrementCount(), Capacity() and SlotInUse().

<br>
255

edits

Navigation menu