An example of using libcsg

From Genome Analysis Wiki
Revision as of 18:57, 14 January 2010 by Zhanxw (talk | contribs) (Created page with 'Here I showed an example using Goncalo's library (I assume he agreed me to do so).<br> The purpose of this program is (1) to extract a range of text from a input file, and count…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Here I showed an example using Goncalo's library (I assume he agreed me to do so).

The purpose of this program is (1) to extract a range of text from a input file, and count there frequencies. (2) count which line have the same text as the first line

The code will open a file (specified by -h parameter), take the third field, obtain a range of text (range is specified by starting position using -f and ending position using -t).

#include "StringArray.h"
#include "StringHash.h"
#include "Parameters.h"
#include "Error.h"

int main(int argc, char ** argv)
   {
   String filename;
   int    firstMarker = 0, lastMarker = 10;

   ParameterList pl;

   pl.Add(new StringParameter('h', "Haplotype File", filename));
   pl.Add(new IntParameter('f', "From Position", firstMarker));
   pl.Add(new IntParameter('t', "To Position", lastMarker));
   pl.Read(argc, argv);
   pl.Status();

   IFILE input = ifopen(filename, "rt");

   if (input == NULL)
      error("Failed to open file %s", (const char *) filename);

   String buffer;
   StringArray tokens;
   StringArray haplos, names;
   StringIntHash  haploCounts;

   while (!ifeof(input))
       {
       buffer.ReadLine(input);
       tokens.ReplaceTokens(buffer);

       if (buffer.Length() > 0 && tokens.Length() != 3)
           {
           printf("Expect 3 words per line but the line beginning with \"%.10s\" looks different ...", (const char *) buffer);
           continue;
           }

       haplos.Push(tokens[2].Mid(firstMarker, lastMarker));
       names.Push(tokens[0]);

       haploCounts.IncrementCount(tokens[2].Mid(firstMarker, lastMarker));

       }

   printf("Haplotype Counts\n");
   for (int i = 0; i < haploCounts.Capacity(); i++)
       if (haploCounts.SlotInUse(i))
          printf("%s %d\n", (const char *) haploCounts[i], haploCounts.Integer(i));

   printf("Haplotypes that match the first one\n");
   for (int i = 1; i < haplos.Length(); i++)
       if (haplos[i] == haplos[0])
          printf("%s (%d)\n", (const char *) names[i], i + 1);
   printf("\n");

   ifclose(input);
   }

A example input, say INPUT.txt is like:

WTCCC66061->WTCCC66061 HAPLO1 AGACTCTGATAGCGATAACC
WTCCC66061->WTCCC66061 HAPLO2 GGGTTCCGATGGCGATAACC
WTCCC66062->WTCCC66062 HAPLO1 AGACTCTGATGGCGCTAACC
WTCCC66062->WTCCC66062 HAPLO2 AGACTCTGATAGCGATGATC
WTCCC66063->WTCCC66063 HAPLO1 AGACTCTTATGGCGCTAGCC
WTCCC66063->WTCCC66063 HAPLO2 AGACTCTTATAGCGATAACC
WTCCC66064->WTCCC66064 HAPLO1 AGACTCTGATGGCGATAGCC
WTCCC66064->WTCCC66064 HAPLO2 AGACTCTGATGACGCTAGCC
WTCCC66065->WTCCC66065 HAPLO1 AGACTCTGATGGCGATAACC
WTCCC66065->WTCCC66065 HAPLO2 AGACTCTGATGGCGATAGCC

And if we run "extractHaplo -h INPUT.txt -f 1 -t 3", and the ouput looks like

The following parameters are in effect:
Haplotype File : INPUT.txt (-hname)
From Position : 1 (-f9999)
To Position : 3 (-t9999)

Haplotype Counts
GGT 1
GAC 9
60 1
Haplotypes that match the first one
WTCCC66062->WTCCC66062 (3)
WTCCC66062->WTCCC66062 (4)
WTCCC66063->WTCCC66063 (5)
WTCCC66063->WTCCC66063 (6)
WTCCC66064->WTCCC66064 (7)
WTCCC66064->WTCCC66064 (8)
WTCCC66065->WTCCC66065 (9)
WTCCC66065->WTCCC66065 (10)

  • To read a file, use IFILE class, which is wrapper for read/write file. A particular useful thing is that it handle gzipped file transparently. Important functions are: ifopen(), ifclose().
  • To handle strings, we prefer to use String class. In this area, handling string is a versatile task. A String class encapsulate basic operations such as index[], append(+), equality(=), extract(Left, Right, Mid). String class can be seamlessly used with IFILE class to acces file. See the while loop in the example code and notice the function ReadLine().
  • To tokenize a String class, we can use StringArray class. It has ReplaceToken() which will store each token field like an array.
  • To associate a String class to a integer type, there is a class named StringIntHash, important functions are IncrementCount(), Capacity() and SlotInUse().