Make file tutorial

From Genome Analysis Wiki
Revision as of 10:31, 12 June 2015 by Atks (talk | contribs) (Created page with "= Introduction = GNU Makefile is a widely used tool for managing the complicated process of compiling a C program. But this is not the only use for this very powerful tool. ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Introduction

GNU Makefile is a widely used tool for managing the complicated process of compiling a C program. But this is not the only use for this very powerful tool.

A statistical analysis usually involves multiple data preparation steps just to mould the input into a form that is acceptable by the analysis tool. Analysis steps involving large data sets requires parallelization and this means partitioning the data into subsets that may be run independently on the cluster. Upon completion of the analyses, the partial outputs have to be merged into a file again before plots are made to summarize the results. This is further compounded when one is interested in the effects of multiple parameter settings in an analysis.

All these often result in hundreds of separate commands invoked and storing all these commands in a text file is probably not the efficient way.

Solution

Makefiles are more than just tools for compiling programs. The dependency structure of a makefile allows one to run and rerun an analysis pipeline in a convenient fashion.

Makefiles themselves are not that readable if there are many dependencies. We can use a script where we can express the analysis pipeline more easily and use it to generate a Makefile.