Chapter 3 Prepare data
3.1 Load data
Load the original data files outputted by the bioinformatic pipeline.
3.1.7 Genome tree
genome_tree <- read_tree("data/metagenomics/genome_tree.tre")
genome_tree$tip.label <- str_replace_all(genome_tree$tip.label,"'", "") #remove single quotes in MAG names
genome_tree$tip.label <- str_remove(genome_tree$tip.label, "\\.fa$") #remove .fa suffix
genome_tree <- keep.tip(genome_tree, tip=genome_taxonomy$genome) # keep only MAG tips
3.2 Create working objects
Transform the original data files into working objects for downstream analyses.
3.2.2 Transform reads into genome counts
3.3 Prepare color scheme
AlberdiLab projects use unified color schemes developed for the Earth Hologenome Initiative, to facilitate figure interpretation.
phylum_colors <- read_tsv("https://raw.githubusercontent.com/earthhologenome/EHI_taxonomy_colour/main/ehi_phylum_colors.tsv") %>%
right_join(genome_metadata, by=join_by(phylum == phylum)) %>%
arrange(match(genome, genome_tree$tip.label)) %>%
select(phylum, colors) %>%
unique() %>%
arrange(phylum) %>%
pull(colors, name=phylum)