Make file tutorial
Introduction
GNU Make is often thought of as a tool for managing the compilation of large C programs. This is true, but it's potential is not limited to this!
At its core, it is a generic pipelining framework that is aware of dependencies and can run steps in parallel.
Statistical genetics analyses often requires multiple steps to prepare the data, run computationally expensive analyses and then collating the data.
Make can potentially save you lots of time and hair pulling especially when your supervisor asks for ALL the analyses again but this time only with rare variants.
Make allows you to redo part of your analyses too and only rerun the parts where require change by carefully deleting files that are to be changed.
Example
This example does the following:
- generate 100 log files with a number written to it
- concatenate the 100 log files into one file
- delete the 100 log files
The example files may be found in /net/fantasia/home/atks/makefile_tutorial
#generate make file using perl script ./generate_simple_stuff
#generate make file using perl script to launch jobs on slurm ./generate_simple_stuff -l slurm
#generate make file using perl script to launch jobs on slurm #files are stored in <dir> which must be described as an absolute path ./generate_simple_stuff -l slurm -o <dir>
#run make file sequentially make -f simple_stuff.mk
#run make file in parallel to at most 100 jobs make -f simple_stuff.mk -j 100
#clear files from run make -f simple_stuff.mk clean
Script
#!/usr/bin/perl -w
use warnings;
use strict;
use POSIX;
use Getopt::Long;
use File::Path;
use File::Basename;
use Pod::Usage;
=head1 NAME
generate_simple_stuff_makefile
=head1 SYNOPSIS
generate_simple_stuff_makefile [options]
-o output directory : location of all output files
-m output make file
example: ./generate_simple_stuff_makefile.pl
=head1 DESCRIPTION
=cut
#option variables
my $help;
my $verbose;
my $debug;
my $outputDir = getcwd();
my $makeFile = "simple_stuff.mk";
my $launchMethod = "local";
#initialize options
Getopt::Long::Configure ('bundling');
if(!GetOptions ('h'=>\$help, 'v'=>\$verbose, 'd'=>\$debug,
'o:s'=>\$outputDir,
'l:s'=>\$launchMethod,
'm:s'=>\$makeFile)
|| !defined($outputDir)
|| scalar(@ARGV)!=0)
{
if ($help)
{
pod2usage(-verbose => 2);
}
else
{
pod2usage(1);
}
}
if ($launchMethod ne "local" && $launchMethod ne "slurm")
{
print STDERR "Launch method has to be local or slurm\n";
exit(1);
}
##############
#print options
##############
printf("Options\n");
printf("\n");
printf("output directory : %s\n", $outputDir);
printf("launch method : %s\n", $launchMethod);
printf("\n");
my @nodes = ();
for my $i (140..171)
{
push(@nodes, "$i");
}
my $nodes = join(",", @nodes);
#arrays for storing targets, dependencies and commands
my @tgts = ();
my @deps = ();
my @cmds = ();
#temporary variables
my $tgt;
my $dep;
my @cmd;
mkpath($outputDir);
my $inputFiles = "";
my $inputFilesOK = "";
my $inputFile = "";
my $outputFile = "";
######################
#1. Generate 100 files
######################
for my $i (1..100)
{
$inputFiles .= " $outputDir/$i.log";
$inputFilesOK .= " $outputDir/$i.OK";
$tgt = "$outputDir/$i.OK";
$dep = "";
@cmd = ("echo $i > $outputDir/$i.log");
makeJob($launchMethod, $tgt, $dep, @cmd);
}
#########################
#2. Concatenate 100 files
#########################
$outputFile = "$outputDir/all.log";
$tgt = "$outputFile.OK";
$dep = $inputFilesOK;
@cmd = ("cat $inputFiles > $outputFile");
makeJob($launchMethod, $tgt, $dep, @cmd);
###########################
#3. Cleanup temporary files
###########################
$tgt = "$outputDir/cleaned.OK";
$dep = "$outputDir/all.log.OK";
@cmd = ("rm $inputFiles");
makeJob($launchMethod, $tgt, $dep, @cmd);
#*******************
#Write out make file
#*******************
open(MAK,">$makeFile") || die "Cannot open $makeFile\n";
print MAK ".DELETE_ON_ERROR:\n\n";
print MAK "all: @tgts\n\n";
#clean
push(@tgts, "clean");
push(@deps, "");
push(@cmds, "\t-rm -rf $outputDir/*.OK $outputDir/*.log");
for(my $i=0; $i < @tgts; ++$i)
{
print MAK "$tgts[$i]: $deps[$i]\n";
print MAK "$cmds[$i]\n";
}
close MAK;
##########
#functions
##########
#run a job either locally or by slurm
sub makeJob
{
my ($method, $tgt, $dep, @cmd) = @_;
if ($method eq "local")
{
makeLocalStep($tgt, $dep, @cmd);
}
elsif ($method eq "slurm")
{
makeSlurm($tgt, $dep, @cmd);
}
}
#run slurm jobs
sub makeSlurm
{
my ($tgt, $dep, @cmd) = @_;
push(@tgts, $tgt);
push(@deps, $dep);
my $cmd = "";
for my $c (@cmd)
{
$cmd .= "\tsrun " . $c . "\n";
}
$cmd .= "\ttouch $tgt\n";
push(@cmds, $cmd);
}
#run a local job
sub makeLocalStep
{
my ($tgt, $dep, @cmd) = @_;
push(@tgts, $tgt);
push(@deps, $dep);
my $cmd = "";
for my $c (@cmd)
{
$cmd .= "\t" . $c . "\n";
}
$cmd .= "\ttouch $tgt\n";
push(@cmds, $cmd);
}
Solution
Makefiles are more than just tools for compiling programs. The dependency structure of a makefile allows one to run and rerun an analysis pipeline in a convenient fashion.
Makefiles themselves are not that readable if there are many dependencies. We can use a script where we can express the analysis pipeline more easily and use it to generate a Makefile.