Frontpage Data visualisation Parametizing data Directory structure R-package SQL Zotero Reproductibility Future endeavours Free research (Machine learning) CV Bibliography
To prove my skills in handling and making initial visualisations of data, a mock dataset supplied by J. louter (INT/ILC) has been analysed. This dataset was derived from an experiment in which adult C.elegans were exposed to varying concentrations of different compounds. Then, after an exposure time of 68 hours, these C.elegans were tested for amount of offspring they gave.
Firstly, the data (which was stored in an Excel file) has been read using tidyverse’s “readxl()”. A datatable has been created to show this initial dataset
<-here::here("data.raw/CE.LIQ.FLOW.062_Tidydata.xlsx")
excel_location<-read_excel(excel_location, sheet = 1)
Celegans_data_rawdatatable(Celegans_data_raw, options = list(
scrollX=TRUE
))
After reading in this data, it has been transformed in order to be able to create proper figures from it. For this, the important data columns have been selected: expType: The type of experiment (Experimental, control)
RawData: Amount of offspring C.elegans gave after 68 hours of exposure to the treatment
compName: The compound to which the C.elegans was exposed to.
compConcentration: The concentration of the compound which the C.elegans was exposed to.
#Dataset inspection and transformation
#Selecting data for this goal
<-Celegans_data_raw %>% dplyr::select(expType, compName, RawData, compConcentration)
Celegans_data_select
#Datapoint 259 has a comma instead of a point. Transforming value via str_replace
$compConcentration<-Celegans_data_select$compConcentration %>% str_replace(",", ".")
Celegans_data_select
#Now properly transforming the compConcentration data into numeric
$compConcentration<-Celegans_data_select$compConcentration %>% as.numeric()
Celegans_data_select
#Transforming expType and compName into a factor variable
<-unique(Celegans_data_select$expType) #storing all exptype levels
levels_exptype$expType<-factor(Celegans_data_select$expType, levels = levels_exptype) #Transforming exptype
Celegans_data_select<-unique(Celegans_data_select$compName) #Storing all compName levels
levels_compname$compName<-factor(Celegans_data_select$compName, levels = levels_compname) #Transforming compName->factor
Celegans_data_select
#Filtering NA's from RawData
<-Celegans_data_select %>% filter(!is.na(RawData))
Celegans_data_select
#Re-loading the data
<-Celegans_data_select
Celegans_data
#Checking if transformation went properly
datatable(Celegans_data, options = list(
scrollX=TRUE
))
After tidying the data from the excel file, exploratory graphs have been created to study the data more thoroughly
%>% ggplot(aes(x=log10(compConcentration+0.00005), y=RawData))+ #Added 0.0005 to prevent data loss
Celegans_data geom_jitter(aes(colour=compName, shape=expType), width = 0.05)+
theme_bw()+
labs(
title="Effect of multiple treatments on offspring production by C.elegans",
x="log 10 treatment concentration (nM)",
y="Amount of offspring",
colour="Treatment",
shape="Experiment type"
)
To properly be able to study the effect of the different treatments on C.elegans, the data will be normalized for the negative control S-medium.
#Determine the mean of the negative control
<-Celegans_data_select$RawData[Celegans_data_select$expType=="controlNegative"] %>% mean()
negctrl_mean
#Normalising the data
<-Celegans_data_select %>% mutate(
Celegans_data_select_normalisednormalised_RawData_percentage=RawData/negctrl_mean*100
)
#Plotting the normalised data
%>% ggplot(aes(x=log10(compConcentration+0.00005), y=normalised_RawData_percentage))+
Celegans_data_select_normalised geom_jitter(aes(colour=compName, shape=expType), width = 0.05)+
theme_bw()+
labs(
title="Normalised effect of multiple treatments on offspring production by C.elegans",
x="log 10 treatment concentration (nM)",
y="Normalised amount of offspring (%)",
colour="Treatment",
shape="Experiment type"
)
For a clearer picture of the correlations, a summarized version of the graph has also been made:
#Creating a summarised version of the data based on Treatment and concentration
<-Celegans_data_select_normalised %>%
Celegans_data_normalised_sum::filter(!is.na(normalised_RawData_percentage)) %>%
dplyr::group_by(compConcentration, compName) %>%
dplyr::summarise(
dplyrmean_normalised=mean(normalised_RawData_percentage)
)
## `summarise()` has grouped output by 'compConcentration'. You can override using the `.groups` argument.
#Filtering out S-medium, as that is the negative control.
<-Celegans_data_normalised_sum %>% filter(!compName=="S-medium")
Celegans_data_normalised_sum
#Plotting the summarised data.
%>% ggplot(aes(x=log10(compConcentration+0.00005), y=mean_normalised))+
Celegans_data_normalised_sum geom_point(aes(colour=compName), size=3)+
geom_line(aes(colour=compName))+
theme_bw()+
labs(
title="Normalised effect of multiple treatments on offspring production by C.elegans",
x="log 10 treatment concentration (nM)",
y="Normalised amount of offspring (%)",
colour="Treatment"
)
Based on these exploratory graphs, we can conclude that 2,6-diisopropylnaphthalene, decane and nephthalene all cause a decrease in the ammount of offspring generated by C.elegans. Decane only appears to decrease offspring at higher concentrations, 2,6-diisopropylnaphthalene seems to cause a relatively constant decrease in offspring at increasing concentrations, and nephthalene seems to stagnate untill extremely high concentrations are used.
Based on this data, a LC-50 analysis could be performed.