Fit of univariate distributions to non-censored data

Laure Ngouanfo Fopa
Curious
Published in
4 min readNov 20, 2020

--

[under review]

If you wish to figure out the distribution (and so its corresponding parameters, e.g normal distribution has mean and variance as parameters) your univariate data may best fit into, “fitdistrplus” helps to generate estimate of these parameters corresponding to a specified distribution. It fits univariate distributions to non-censored data by maximum likelihood (mle), moment matching (mme), quantile matching (qme) or maximizing goodness-of-fit estimation (mge). The latter is also known as minimizing distance estimation. Generic methods are print, plot, summary, quantile, logLik, vcov and coef.

1. fit a gamma distribution by maximum likelihood estimation

Install the package “fitdistrplus”. Use data groundbeef in the package “fitdistrplus”. groundbeef is a data of one variable “serving” (univariate data).

data(groundbeef)
serving <- groundbeef$serving
fitg <- fitdist(serving, “gamma”)

view(fitg)

summary(fitg) #this produces estimates of the parameters of the gamma distribution which are the shape and the rate.

plot(fitg)

plot(fitg, demp = TRUE)

plot(fitg, histo = FALSE, demp = TRUE)

cdfcomp(fitg, addlegend=FALSE)

denscomp(fitg, addlegend=FALSE)

ppcomp(fitg, addlegend=FALSE)

qqcomp(fitg, addlegend=FALSE)

2. use the moment matching estimation (using a closed formula)

fitgmme <- fitdist(serving, “gamma”, method=”mme”)
summary(fitgmme)

plot(fitgmme, demp=T)

3. Comparison of various fits

fitW <- fitdist(serving, “weibull”)
fitg <- fitdist(serving, “gamma”)
fitln <- fitdist(serving, “lnorm”)
summary(fitW)
summary(fitg)
summary(fitln)

cdfcomp(list(fitW, fitg, fitln), legendtext=c(“Weibull”, “gamma”, “lognormal”))

denscomp(list(fitW, fitg, fitln), legendtext=c(“Weibull”, “gamma”, “lognormal”))

qqcomp(list(fitW, fitg, fitln), legendtext=c(“Weibull”, “gamma”, “lognormal”))

ppcomp(list(fitW, fitg, fitln), legendtext=c(“Weibull”, “gamma”, “lognormal”))

gofstat(list(fitW, fitg, fitln), fitnames=c(“Weibull”, “gamma”, “lognormal”))

4. defining your own distribution functions, here for the Gumbel distribution
for other distributions, see the CRAN task view dedicated to probability distributions

dgumbel <- function(x, a, b) 1/b*exp((a-x)/b)*exp(-exp((a-x)/b))
pgumbel <- function(q, a, b) exp(-exp((a-q)/b))
qgumbel <- function(p, a, b) a-b*log(-log(p))

fitgumbel <- fitdist(serving, “gumbel”, start=list(a=10, b=10))
summary(fitgumbel)

plot(fitgumbel, demp=T)

5. fit discrete distributions (Poisson and negative binomial)

data(toxocara)
number <- toxocara$number
fitp <- fitdist(number,”pois”)
summary(fitp)

plot(fitp)

fitnb <- fitdist(number,”nbinom”)
summary(fitnb)

plot(fitnb, demp=T)

cdfcomp(list(fitp,fitnb))

gofstat(list(fitp,fitnb))

6. how to change the optimisation method?

data(groundbeef)
serving <- groundbeef$serving
fitdist(serving, “gamma”, optim.method=”Nelder-Mead”)
fitdist(serving, “gamma”, optim.method=”BFGS”)
fitdist(serving, “gamma”, optim.method=”SANN”)

Find out more here:

https://rdrr.io/cran/fitdistrplus/man/fitdist.html

--

--

Laure Ngouanfo Fopa
Curious

Teaching/Research Assistant in Mathematics/(Bayesian) Statistics, Writer