Maximum likelihood estimation (MLE)
Maximum likelihood estimation (MLE) is a popular method used to fit a mathematical model (probability distribution) to empirical data. When we know the form of the distribution, through MLE the best parameters are obtained by maximizing the probability of obtaining the samples observed.
Assume we have made N measurements of x {x_1, x_2, … x_N}, we know the pdf that describes x: f(x, α), and we want to determine the parameter α.
We can pick α to maximize the probability of getting the measurements that we got.
Akaike test
We use Akaike's information criterion (AIC) as a model (distribution) selection criterion. AIC is based on Akaike's finding of a formal relationship between Kullback-Leibler information (a dominant paradigm in information and coding theory) and likelihood theory (the dominant paradigm in statistics) [1].
The Akaike test gives an estimate of
the expected, relative distance between the fitted distribution and the unknown true distribution obtained from the
observed data. In this test, the best fitting distribution is
the one giving the minimum Akaike's Information Criteria
(AIC) value or Akaike Weight (AW) closest 1.
Examples
We analyze three empirical data sets of human Ineter contact time (ICT)s.
Those data sets are traces taken in UCSD, Dartmouth
University and Infocom 2005. The UCSD data
records mobility patterns of 275 wireless PDA users within
a campus
WiFi? network for the duration of 11 weeks. The
Dartmouth data contains thousands of laptop/PDA users
using campus
WiFi? networks over years. The experiment
in Infocom 2005 has inter-contact information of 41 iMotes
(Bluetooth devices) carried by attendees of the conference
for 3 to 4 days. We pre-process all the data except Dartmouth for which we preprocess one-month worth of contact
information.
Flight Length
The following figures represent fitting results to short tailed distributions for (a) NCSU, (b) KAIST, (c) NYX and (d) Disney world.
Akaike test results are shown in the table below.
The following figures represent fitting results subexponential class distributions for (a) NCSU, (b) KAIST, (c) NYX and (d) Disney world.
Akaike test results are shown in the table below.
ICT
We apply MLE to fit to the complementary cumulative density function
(CCDF) of the produced ICT distributions from the traces, six well-known distributions: exponential, lognormal, power law, gamma,Weibull
and truncated Pareto distributions.
References
[1] K. P. Burnham and D. R. Anderson, Multimodel inference: understanding AIC and
BIC in model selection, Sociological Methods and Research 33, 261–304, (2004).