Communicable Diseases Genomics Network
CDGN-banner-genomics-3-232-32.jpg

News

Getting Started on Microbial Genomics

Embarking on training in microbial genomics can be daunting with so much new terminology. We asked several members of the CDGN Teaching, Training and Curriculum Working Group to prepare some information from their experience and resources they used that can hopefully assist others in their training journey, whether to provide a starting point to wrap your head around what is ahead or whether to guide your discussion with a prospective training lab for example for NPAAC/RCPA microbial genomics certification.

We haven’t listed any particular organisms to focus on, but it is important to extend your learning beyond just one laboratory and include key organisms of interest in clinical and public health practice, such as the early adopted genomics of Mycobacteria, SARS-CoV-2, Salmonella, Shigella and more. We hope that the journey described below will assist you on yours and we have included some links for the resources mentioned, which are current at the time of writing.

Dr Sanmarié Schlebusch, Co-Chair of the CDGN Teaching, Training and Curriculum Working Group.

A journey by Jo Chua, David Foley, and Tony Allworth

The preamble of the MicroBinfie Podcast encapsulates the complexity of starting to learn microbial genomics:

 "There is so much information we all know from working in the field, but nobody really writes it down, there is no manual, and it is assumed you will pick it up".

We agree wholeheartedly with this statement. In our journey into microbial genomics, we did not identify a single unifying primary or introductory resource to guide the learning process. Further, as adult learners with different experiences and learning styles, we adopted diverse approaches to building our foundational knowledge - there is more than one way to start. Here, we briefly overview our approaches and some worthwhile resources. If you need inspiration, we suggest listening to the MicroBinfie Podcast episodes 82, 84 and 85, which give an excellent historical overview of the field.

A Warning about Alice

Microbial genomics is a rapidly evolving space - be prepared to learn fast, be flexible and know that new theories and tools are emerging. It is easy to get lost exploring rabbit holes whilst trying to understand underlying concepts, critically appraising reports or resolving incongruent theories. The "rabbit hole" is inviting, and its paths dangerously addictive, with many uncertainties. Finding a learning group and a mentor is crucial to keeping you on the right path.

Build the Basics

We highly recommend a basic primer on DNA, RNA bases, the difference between purines and pyrimidines and mobile genetic elements (e.g., plasmids, insertion sequences, and transposons). In-depth knowledge is not required, but these concepts will be reused throughout microbial genomics. 

Start with the Wet Laboratory: generating sequences

Understand Sequencing Approaches and Platforms

We suggest starting with "traditional" Sanger sequencing and moving through the current leading platforms, examining the associated strengths and pitfalls: Illumina, Ion Torrent,  PacBio, and Oxford Nanopore (ONT). Exploring extraction, library preparation methods, and sequencing is essential – the Illumina website has many videos and tutorials about their platform, and ONT also has some. The CDGN webinars are also a "must-see".

Wet Laboratory Quality /Sequencing adequacy : Quantity, Quality and Intergrity.

Once you understand how the platforms work, exploring wet laboratory quality metrics should be the next goal. We suggest considering the features of a "good" extraction or library and how contamination can be identified. How much DNA should be present? Are there other contaminants? What size/ length should the fragments be? 

Bioinformatics

An overview

"In all things bioinformatic, the answer is 'it depends'."

Bioinformatics is the term used to describe the application of tools to analyse and interpret biological data. The language used within this sphere is jargon-heavy, and terms frequently don't follow strict definitions. We believe it is important to share that we found this approach confusing and sometimes frustrating, but it eventually becomes clear(ish).

Command-line vs Graphic User Interfaces (GUI)

Command-line (using code-based languages) is heavily embedded in bioinformatics. However, there is expanding use of GUIs (e.g., mouse-driven point-and-click interfaces similar to Windows or Mac operating systems). We strongly suggest initially avoiding command line and concentrating on free-to-use GUI platforms like Galaxy. Other pay-to-use platforms, such as Geneious and CLC workbench, may be available at your training site.

What is your question?

This is the first consideration in bioinformatics. The tools used are directed by the question being asked. The question, at times, needs to be very specific as tools can have subtle differences.

More Quality

"Garbage in Garbage out"

Assessing the quality of sequences and assemblies as a first step is essential to determine if the data generated in the wet laboratory is valid. We advocate for "hands-on" exposure to quality outputs and reports from the wet lab where possible. Notably, different settings and microbes require vastly different quality metrics/ scores to identify poor quality sequences- again a confusing area. Further, even when a sequence "fails" a particular quality metric, it does not mean it cannot be utilised. The overarching question being asked needs to be considered.

Essential areas to understand:

  • What are kmers, their benefits and limitations and how are they used in bioinformatics?

  • Approaches to identifying contamination using wet laboratory and bioinformatic techniques

  • De novo assembly (akin to putting the puzzle parts back together without a guide) VS mapping to a known reference sequence (placing the pieces on top of something that may look like the original puzzle)

  • What is the difference between a tool and a database, and why are versions important?

  • Considerations in choosing a database include purpose, usability, curation, comprehensiveness, and maintenance

  • Identification: Taxonomic classification

  • Antimicrobial gene, virulence factor detection and variant calling

  • Computing resources and security considerations

  • Metadata tiers and importance

Relatedness Assessments: Evolution

Microbes continually change and evolve. Some bioinformatic techniques apply models to predict these changes, aiming to permit a more accurate comparison of sequences. We suggest starting with the Primer series by Paul Lewis on phyloseminar.org – it’s a bit heavy on probability theory but worth the effort. The Sydney University-run workshop on Phylogenetics is excellent; pre-reading is highly recommended to get the best yield.

Relatedness Assessments: Grouping and Splitting

We framed the concepts/techniques of relatedness assessments of sequences/ microbes into grouping and splitting. Public health-related reports and medical research-centred analysis typically focus on initially grouping microbes into broad boxes using taxonomy, multi-locus sequence typing (MLST) or core genome MLST. Within these groups, the members are often further "split", typically measuring single nucleotide differences. Maximising the size of the core genome and choosing an appropriate reference genome are essential concepts in this area.

Take Home Message

This overview presents our thoughts, formed by our journey and experiences. It is by no means all-inclusive, and components may well be outdated by the time this is published. Your path will almost certainly be different; new tools, techniques and educational resources will continue to emerge in this rapidly evolving field shaping your experiences. Enjoy the ride. Keep reading books and publications, watching videos and listening to podcasts.

Strength in numbers: talk to your study group – individuals will find and share different resources.

Biology and microbial genomics are not separate; do not get distracted by the tools and forget about the microbes and their molecular biology.

Recommended resources:

YouTube™

Podcasts

  • MicroBinfie Podcast: great podcast and well worth revisiting multiple times during your journey

Websites

There are a number of websites worth exploring depending on the organism:

Online Courses

  • We strongly suggest creating a Galaxy Australia log-in and exploring the tutorials.

  • The University of Melbourne has a MicroCert course based on Galaxy that is also useful. There is a cost associated with this MicroCert. 

  • Github tutorials: It is the Insta for bioinformatic nerds and Strava for athletes.

Key Research Papers

  • Boolchandani, M., D'Souza, A.W. & Dantas, G. Sequencing-based methods and resources to study antimicrobial resistance. Nat Rev Genet 20, 356–370 (2019). https://doi.org/10.1038/s41576-019-0108-4

  • Uelze, L., Grützke, J., Borowiak, M. et al. Typing methods based on whole genome sequencing data. One Health Outlook 2, 3 (2020). https://doi.org/10.1186/s42522-020-0010-1

  • Yang, Z., Rannala, B. Molecular phylogenetics: principles and practice. Nat Rev Genet 13, 303–314 (2012). https://doi.org/10.1038/nrg3186

  • Kozyreva VK, Truong CL, Greninger AL, Crandall J, Mukhopadhyay R, Chaturvedi V. Validation and Implementation of Clinical Laboratory Improvements Act-Compliant Whole-Genome Sequencing in the Public Health Microbiology Laboratory. J Clin Microbiol. 2017 Aug;55(8):2502-2520. https://doi.org/10.1128/jcm.00361-17. Epub 2017 Jun 7. PMID: 28592550; PMCID: PMC5527429.

Rahul Ratwatte