ggmanh 1.8.0

The package `ggmanh`

is aimed to provide easy and direct access to visualisation to the GWAS / PWAS results while also providing many functionalities and features.

Manhattan plot is commonly used to display significant Single Nucleotide Polymorphisms (SNPs) in Genome Wide Association Study (GWAS). The x-axis is divided into chromosomes, and SNPs are plotted in their respective positions. The y-axis typically represents \(-10*log(p value)\). Majority of the points have low y-values, with some of the significant SNPs having high y-values. It is not uncommon to see strong association between SNPs and a phenotype, yielding a high y-value. This results in a wide y-scales in which SNPs with lower significance are squished.

This function addresses this problem by rescaling the y-axis according to the range of y. There are more features, such as labelling without overlap (with the help of ggrepel package), reflecting the size of chromosomes along the x-axis, and displaying significant lines.

The package can be installed by:

```
if (!require("BiocManager"))
install.packages("BiocManager")
BiocManager::install("ggmanh")
```

`manhattan_plot()`

is a generic method that can take a `data.frame`

, `MPdata`

, or a `GRanges`

object. The `data.frame`

, at bare minimum, must have three columns containing: `chromosome`

, `position`

, and `p.value`

. For a `GRanges`

object, meta data column name for the p-value needs to be passed.

There are two steps to this function: preprocess data and plot data. The preprocessing step (accomplished with `manhattan_data_preprocess()`

) preprocesses the data by calculating the new x-position to map to the plot (`new_pos`

column added to the data), “thining” the data points, and saving other graphical information needed for manhattan plot, which is returned as a `MPdata`

object. The plot step (accomplished with `manhattan_plot()`

) determines if rescaling of the y-axis is needed and plots / saves the manhattan plot.

While using `manhattan_plot()`

on `data.frame`

is sufficient, it is fine to separately run pre-processing and plotting for customizing the plot without having to preprocess again and again.

```
library(ggmanh)
#> Loading required package: ggplot2
library(SeqArray)
#> Loading required package: gdsfmt
```

First, create a simulated data to be used for demonstration.

```
set.seed(1000)
nsim <- 50000
simdata <- data.frame(
"chromosome" = sample(c(1:22,"X"), size = nsim, replace = TRUE),
"position" = sample(1:100000000, size = nsim),
"P.value" = rbeta(nsim, shape1 = 5, shape2 = 1)^7
)
```

`manhattan_plot`

expects data.frame to have at least three columns: chromosome, position, and p.value.

```
head(simdata)
#> chromosome position P.value
#> 1 16 41575779 0.135933290
#> 2 4 73447172 0.764033749
#> 3 11 82120979 0.002440878
#> 4 22 94419970 0.644460838
#> 5 19 38141341 0.184945910
#> 6 3 43235060 0.774330251
```

To avoid ambiguity in plotting, it is recommended that that the chromosome column is passed as a factor, or `chr.order`

is specified.

`simdata$chromosome <- factor(simdata$chromosome, c(1:22,"X"))`

This is the bare minimum to plot manhattan plot, and `manhattan_plot`

can handle the rest.

```
g <- manhattan_plot(x = simdata, pval.colname = "P.value", chr.colname = "chromosome", pos.colname = "position", plot.title = "Simulated P.Values", y.label = "P")
g
```

`manhattan_plot`

is also defaulted to display the GWAS p.value threshold at `5e-8`

and `5e-7`

. For now, the threshold is required; the values and color can be customized.

The function is also suited to rescale the y-axis depending on the magnitude of p values.

Let’s suppose that there are signals from chromosome 5 and 21, and the significant p-value is low for chromosome 21 and even lower for chromosome 5.

```
tmpdata <- data.frame(
"chromosome" = c(rep(5, 10), rep(21, 5)),
"position" = c(sample(250000:250100, 10, replace = FALSE), sample(590000:600000, 5, replace = FALSE)),
"P.value" = c(10^-(rnorm(10, 100, 3)), 10^-rnorm(5, 9, 1))
)
simdata <- rbind(simdata, tmpdata)
simdata$chromosome <- factor(simdata$chromosome, c(1:22,"X"))
```

```
g <- manhattan_plot(x = simdata, pval.colname = "P.value", chr.colname = "chromosome", pos.colname = "position", plot.title = "Simulated P.Values - Significant", rescale = FALSE)
g
```