# Fitting Distribution in R

**Purpose**

Is to work through FITTING DISTRIBUTIONS WITH R Release 0.4-21 February 2005 It is a 6 year old document but I think I should get my hands dirty a bit to be a plug and play mode.Also , any revisit of this document will make me think in a different way.

There are four steps in fitting distributions 1. Model/funtion choice 2. Estimate parameters 3. Evaluate goodness of fit 4. Goodness of fit statistical tests

**Histogram**

> x.norm <- rnorm(n = 200, m = 10, sd = 2) > hist(x.norm, main = "Histogram of observed data") |

**Density state**

> plot(density(x.norm), main = "Density estimate") |

**ECDF state**

> plot(ecdf(x.norm), main = "Density estimate") |

**This is something I learnt today**

qqplot can be used to check any sample distribution with theoretical distribution

> x.wei <- rweibull(n = 200, shape = 2.1, scale = 1.1) > x.teo <- rweibull(n = 200, shape = 2, scale = 1) > qqplot(x.teo, x.wei, main = "QQ-plot distr. Weibull") > abline(0, 1) |

The biggest challenge is that , curves only differ by mean, variability, skewness and kurtosis. So, if you standardize the variables, the a functional form with both skewness and kurtosis can be used to check the distribution

**Normal**

> curve(dnorm(x, m = 10, sd = 2), from = 0, to = 20, main = "Normal") |

**Gamma**

> curve(dgamma(x, scale = 1.5, shape = 2), from = 0, to = 15, main = "Gamma") |

**Weibull**

> curve(dweibull(x, scale = 2.5, shape = 1.5), from = 0, to = 15, + main = "weibull") |

**Skewness and Kurtosis for Normal Distribution**

> library(fBasics) > skewness(x.norm) [1] 0.0132836 attr(,"method") [1] "moment" > kurtosis(x.norm) [1] -0.1502271 attr(,"method") [1] "excess" |

**Skewness and Kurtosis for Weibull**

> skewness(x.wei) [1] 0.8799243 attr(,"method") [1] "moment" > kurtosis(x.wei) [1] 0.5694369 attr(,"method") [1] "excess" |

**Check normal**

> x <- rnorm(1000, 2, 3) > fitdistr(x, "normal") mean sd 2.02103915 3.06295954 (0.09685929) (0.06848986) |

**Check gamma**

> x <- rgamma(1000, 2, 3) > fitdistr(x, "gamma") shape rate 2.11176055 3.22441614 (0.08800314) (0.15158757) |

**Check pois**

> x <- rpois(1000, 2) > fitdistr(x, "poisson") lambda 1.96500000 (0.04432832) |

Look at the amazing collection that this fitdistr can be used to check the distributions

**Chi-Square goodness of fit**

> n <- 200 > x.pois <- rpois(n, 2.5) > lambda.est <- mean(x.pois) > tab.os <- table(x.pois) > freq.os <- vector() > for (i in 1:length(tab.os)) { + freq.os[i] <- tab.os[[i]] + } > freq.ex <- dpois((0:max(x.pois)), lambda = lambda.est) * n > acc <- mean(abs(freq.os - trunc(freq.ex))) > acc * 100/mean(freq.os) [1] 16.5 |

**I began going through this 24 page document to check whether a particular distribution comes from cauchy.**

Let me generate a gaussian normal variable and check whether it comes from cauchy

> x <- rnorm(1000, 0, 1) > h <- hist(x, breaks = 20) > y <- rcauchy(1000, 0) > y.cut <- cut(y, breaks = h$breaks) > observed.freq <- as.vector(table(y.cut))/1000 > expected.freq <- as.vector(h$density) > temp <- 0 > for (i in seq_along(observed.freq)) { + temp <- temp + ((observed.freq[i] - expected.freq[i])^2)/expected.freq[i] + } > dof <- length(observed.freq) - 2 - 1 > pchisq(temp, dof) [1] 1 |

**This p is extremely small and hence one can reject the null hypothesis.**

Obviously sample from cauchy is vastly different from sample from normal

**Another learning is the package vcd which can be used to check goodness of fit**

> library(vcd) > y <- rpois(1000, 2) > gf <- goodfit(y, type = "poisson", method = "MinChisq") > summary(gf) Goodness-of-fit test for poisson distribution X^2 df P(> X^2) Pearson 5.228552 7 0.6320941 |