AlberdiLab | Manuscript in prep
Mediterranean podarcis metagenomics
2024-11-14
Chapter 1 Introduction
1.1 Prepare the R environment
1.1.1 Environment
To reproduce all the analyses locally, clone this repository in your computer using:
RStudio > New Project > Version Control > Git
And indicating the following git repository:
Once the R project has been created, follow the instructions and code chunks shown in this webbook.
1.1.2 Libraries
The following R packages are required for the data analysis.
# Base
library(R.utils)
library(knitr)
library(tidyverse)
library(devtools)
library(tinytable)
library(rairtable)
# For tree handling
library(ape)
library(phyloseq)
library(phytools)
# For plotting
library(ggplot2)
library(ggrepel)
library(ggpubr)
library(ggnewscale)
library(gridExtra)
library(ggtreeExtra)
library(ggtree)
library(ggh4x)
# For statistics
library(spaa)
library(vegan)
library(Rtsne)
library(geiger)
library(hilldiv2)
library(distillR)
library(broom.mixed)
#library(lmerTest)
library(Hmsc)
library(corrplot)# Data preparation
1.2 Podarcis filfolensis (PF)
1.2.0.6 Genome annotations
Downloading individual annotation files from ERDA using information in Airtable and writing them to a single compressed table takes a while. The following chunk only needs to be run once, to generate the genome_annotations table that is saved in the data directory. Note that the airtable connection requires a personal access token.
airtable("MAGs", "appWbHBNLE6iAsMRV") %>% #get base ID from Airtable browser URL
read_airtable(., fields = c("ID","mag_name","number_genes","anno_url"), id_to_col = TRUE) %>% #get 3 columns from MAGs table
filter(mag_name %in% paste0(genome_metadata_pf$genome,".fa")) %>% #filter by MAG name
filter(number_genes > 0) %>% #genes need to exist
select(anno_url) %>% #list MAG annotation urls
pull() %>%
read_tsv() %>% #load all tables
rename(gene=1, genome=2, contig=3) %>% #rename first 3 columns
write_tsv(file="data/pf/genome_annotations.tsv.xz") #write to overall compressed file1.2.0.7 Create working objects
Transform the original data files into working objects for downstream analyses.
1.2.0.9 Transform reads into genome counts
1.2.0.11 Prepare color scheme
AlberdiLab projects use unified color schemes developed for the Earth Hologenome Initiative, to facilitate figure interpretation.
phylum_colors_pf <- read_tsv("https://raw.githubusercontent.com/earthhologenome/EHI_taxonomy_colour/main/ehi_phylum_colors.tsv") %>%
right_join(genome_metadata_pf, by=join_by(phylum == phylum)) %>%
arrange(match(genome, genome_tree_pf$tip.label)) %>%
select(phylum, colors) %>%
unique() %>%
arrange(phylum) %>%
pull(colors, name=phylum)1.3 Podarcis gaigeae (PG)
1.3.0.6 Genome annotations
Downloading individual annotation files from ERDA using information in Airtable and writing them to a single compressed table takes a while. The following chunk only needs to be run once, to generate the genome_annotations table that is saved in the data directory. Note that the airtable connection requires a personal access token.
airtable("MAGs", "appWbHBNLE6iAsMRV") %>% #get base ID from Airtable browser URL
read_airtable(., fields = c("ID","mag_name","number_genes","anno_url"), id_to_col = TRUE) %>% #get 3 columns from MAGs table
filter(mag_name %in% paste0(genome_metadata_pg$genome,".fa")) %>% #filter by MAG name
filter(number_genes > 0) %>% #genes need to exist
select(anno_url) %>% #list MAG annotation urls
pull() %>%
read_tsv() %>% #load all tables
rename(gene=1, genome=2, contig=3) %>% #rename first 3 columns
write_tsv(file="data/pg/genome_annotations.tsv.xz") #write to overall compressed file1.3.0.7 Create working objects
Transform the original data files into working objects for downstream analyses.
1.3.0.9 Transform reads into genome counts
1.3.0.11 Prepare color scheme
AlberdiLab projects use unified color schemes developed for the Earth Hologenome Initiative, to facilitate figure interpretation.
phylum_colors_pg <- read_tsv("https://raw.githubusercontent.com/earthhologenome/EHI_taxonomy_colour/main/ehi_phylum_colors.tsv") %>%
right_join(genome_metadata_pg, by=join_by(phylum == phylum)) %>%
arrange(match(genome, genome_tree_pg$tip.label)) %>%
select(phylum, colors) %>%
unique() %>%
arrange(phylum) %>%
pull(colors, name=phylum)1.4 Podarcis milensis (PM)
1.4.0.6 Genome annotations
Downloading individual annotation files from ERDA using information in Airtable and writing them to a single compressed table takes a while. The following chunk only needs to be run once, to generate the genome_annotations table that is saved in the data directory. Note that the airtable connection requires a personal access token.
airtable("MAGs", "appWbHBNLE6iAsMRV") %>% #get base ID from Airtable browser URL
read_airtable(., fields = c("ID","mag_name","number_genes","anno_url"), id_to_col = TRUE) %>% #get 3 columns from MAGs table
filter(mag_name %in% paste0(genome_metadata_pm$genome,".fa")) %>% #filter by MAG name
filter(number_genes > 0) %>% #genes need to exist
select(anno_url) %>% #list MAG annotation urls
pull() %>%
read_tsv() %>% #load all tables
rename(gene=1, genome=2, contig=3) %>% #rename first 3 columns
write_tsv(file="data/pm/genome_annotations.tsv.xz") #write to overall compressed file1.4.0.7 Create working objects
Transform the original data files into working objects for downstream analyses.
1.4.0.9 Transform reads into genome counts
1.4.0.11 Prepare color scheme
AlberdiLab projects use unified color schemes developed for the Earth Hologenome Initiative, to facilitate figure interpretation.
phylum_colors_pm <- read_tsv("https://raw.githubusercontent.com/earthhologenome/EHI_taxonomy_colour/main/ehi_phylum_colors.tsv") %>%
right_join(genome_metadata_pm, by=join_by(phylum == phylum)) %>%
arrange(match(genome, genome_tree_pm$tip.label)) %>%
select(phylum, colors) %>%
unique() %>%
arrange(phylum) %>%
pull(colors, name=phylum)1.5 Podarcis pityusensis (PP)
1.5.0.6 Genome annotations
Downloading individual annotation files from ERDA using information in Airtable and writing them to a single compressed table takes a while. The following chunk only needs to be run once, to generate the genome_annotations table that is saved in the data directory. Note that the airtable connection requires a personal access token.
airtable("MAGs", "appWbHBNLE6iAsMRV") %>% #get base ID from Airtable browser URL
read_airtable(., fields = c("ID","mag_name","number_genes","anno_url"), id_to_col = TRUE) %>% #get 3 columns from MAGs table
filter(mag_name %in% paste0(genome_metadata_pp$genome,".fa")) %>% #filter by MAG name
filter(number_genes > 0) %>% #genes need to exist
select(anno_url) %>% #list MAG annotation urls
pull() %>%
read_tsv() %>% #load all tables
rename(gene=1, genome=2, contig=3) %>% #rename first 3 columns
write_tsv(file="data/pp/genome_annotations.tsv.xz") #write to overall compressed file1.5.0.7 Create working objects
Transform the original data files into working objects for downstream analyses.
1.5.0.9 Transform reads into genome counts
1.5.0.11 Prepare color scheme
AlberdiLab projects use unified color schemes developed for the Earth Hologenome Initiative, to facilitate figure interpretation.
phylum_colors_pp <- read_tsv("https://raw.githubusercontent.com/earthhologenome/EHI_taxonomy_colour/main/ehi_phylum_colors.tsv") %>%
right_join(genome_metadata_pp, by=join_by(phylum == phylum)) %>%
arrange(match(genome, genome_tree_pp$tip.label)) %>%
select(phylum, colors) %>%
unique() %>%
arrange(phylum) %>%
pull(colors, name=phylum)1.6 All
1.6.0.1 Sample metadata
sample_metadata_all <- read_tsv("data/all/DMB0162_metadata.tsv.gz") %>%
rename(sample=1) %>%
mutate(
population = case_when(
species == "Podarcis filfolensis" ~ str_replace(population,"_","_pf_"),
species == "Podarcis gaigeae" ~ str_replace(population,"_","_pg_"),
species == "Podarcis milensis" ~ str_replace(population,"_","_pm_"),
species == "Podarcis pityusensis" ~ str_replace(population,"_","_pp_"),
TRUE ~ NA))1.6.0.7 Create working objects
Transform the original data files into working objects for downstream analyses.
1.6.0.9 Transform reads into genome counts
1.6.0.11 Prepare color scheme
AlberdiLab projects use unified color schemes developed for the Earth Hologenome Initiative, to facilitate figure interpretation.
phylum_colors_all <- read_tsv("https://raw.githubusercontent.com/earthhologenome/EHI_taxonomy_colour/main/ehi_phylum_colors.tsv") %>%
right_join(genome_metadata_all, by=join_by(phylum == phylum)) %>%
arrange(match(genome, genome_tree_all$tip.label)) %>%
select(phylum, colors) %>%
unique() %>%
arrange(phylum) %>%
pull(colors, name=phylum)University of Copenhagen, antton.alberdi@sund.ku.dk↩︎