The field of "bioinformatics" is biology plus data science. For the latter, most people find Unix-like operating systems to be the most efficient way to conduct research. Almost all our tools run in Unix, most of them from the command line, so the bioinformatician must know how to move data around, run programs, and chain the output of one program into another to create analysis pipelines.
Metagenomics is a subset of bioinformatics where we deal with unknown communities of organisms comprised of bacteria, viruses, fungi, archeae, etc. We deal exclusively with sequence-based data, usually measured in the gigabytes and -bases. Sequencing runs typically produce millions or tens of millions of reads. Our goal is to derive some useful knowledge from uncultured samples. We might try to identify pathogens in human wounds, protein function in open ocean water, or relative abundance of microbes in soil.
All the code examples presented here can be found at:
Authors and Contributors
About the authors
Ken went to college (the University of North Texas, 1990) thinking he might study music and become a professional drummer. He decided against that particular career but didn't have an alternate. He changed his major a couple of times (business, communications) before deciding on an English literature degree just so he could get out of undergrad in five years. NB: never did programming cross his mind, and biology was his most loathed science.
After college, Ken cast about for a job that 1) would be interesting and 2) people would pay him to do. At his first job (1995), he learned Microsoft's Access database on Windows 3.1 and created his company's first website by writing HTML into Notepad and using a Windows FTP client to upload to their ISP. The die was cast. His next job was writing a technical manual for a piece of software written in VisualBasic. He was offered a position to learn VB and support the program and another Access program. He went on to learn another database (DBase IV) and language (Delphi). His next job was developing Windows desktop applications in Delphi/Interbase, but he was itching to get into web applications. So his next job (c. 1998) was writing VBScript in Microsoft ASP with SQLServer as a database. At this point, Ken was fairly much fed up with Microsoft and rediscovered Unix.
Ken's first email client in college was "pine" running on the Unix machines which he accessed through the computer labs. He used "talk" to chat with friends, and so had been exposed to the celebrated "command line." Now he discovered that his ISP offered "shell" accounts on their servers accessible by "telnet." He was reading more about Unix on the Internet and how people were using the Perl programming language with CGI (common gateway interface) to create interactive web pages. The more Ken learned about Unix and Perl and "open source/free" software, the more he realized he'd found his tribe. His next job (1999) he moved into developing web apps on Linux platforms using the Apache web server with the MySQL database and Perl (the "LAMP" stack).
Around 2001, Ken saw that a very celebrated Perl developer named Lincoln Stein was looking to hire people. Ken got hired to work on the comparative plant genomics database called "Gramene," but really had no idea what Lincoln had done apart from his modules and books. Lincoln was a very important character in a fairly new field called "bioinformatics" (cf. "How Perl Saved the Human Genome Project" https://www.foo.be/docs/tpj/issues/vol1_2/tpj0102-0001.html) and he ran a research lab at Cold Spring Harbor Laboratory in Cold Spring Harbor, NY. Lincoln hired Ken to write a web-based visual comparative map application (CMap, PMID: 19648141) to augment existing web genome browsers like the UCSC browser, the Ensembl browser, and Lincoln's own Gbrowse. This was Ken's entree into the world of biology and genomics. Around 2004, Lincoln hired Bonnie Hurwitz who left a few years later to earn her PhD from the University of Arizona. In 2014, Bonnie set up her new lab at the UofA and hired Ken to create the infrastructure for the iMicrobe project (http://imicrobe.us).
Ken currently enjoys research computing, teaching, and the pursuit of his MS at UA. He likes living in Tucson with his family (wife, three children), biking, cooking, and playing music.
Do you like this book? Would you like to support the author? Consider a donation via PayPal to "[email protected]"!