distributions

All of the distributions that are provided in the Apache Commons Math project are supported here, in multiple forms.

Continuous or Discrete

These distributions break down into two main categories:

Continuous Distributions

These are distributions over real numbers like 23.4323, with continuity across the values. Each of the continuous distributions can provide samples that fall on an interval of the real number line. Continuous probability distributions include the Normal distribution, and the Exponential distribution, among many others.

Discrete Distributions

Discrete distributions, also known as integer distributions have only whole-number valued samples. These distributions include the Binomial distribution, the Zipf distribution, and the Poisson distribution, among others.

Hashed or Mapped

hashed samples

Generally, you will want to “randomly sample” from a probability distribution. This is handled automatically by the functions below if you do not override the defaults. The hash mode is the default sampling mode for probability distributions. This is accomplished by computing an internal on the unit interval variate input before using the resulting value to map into the sampling curve. This is called the hash sampling mode by VirtData. You can put hash into the modifiers as explained below if you want to document it explicitly.

mapped samples

The method used to sample from these distributions depends on a mathematical function called the cumulative probability function, or more specifically the inverse of it. Having this function computed over some interval allows one to sample the shape of a distribution progressively if desired. In other words, it allows for some percentile-like view of values within a given probability distribution. This mode of using the inverse cumulative density function is known as the map mode in VirtData, as it allows one to map a unit interval variate in a deterministic way to a density sampling curve. To enable this mode, simply pass map as one of the function modifiers for any function in this category.

Interpolated or Computed Samples

When sampling from mathematical models of probability densities, performance between different densities can vary drastically. This means that you may end up perturbing the results of your test in an unexpected way simply by changing parameters of your testing distributions. Even worse, some densities have painful corner cases in performance, like ‘Zipf’, which can make tests unbearably slow and flawed as they chew up CPU resources.

Interpolated Samples

For this reason, interpolation is built-in to these sampling functions. The default mode is interpolate. This means that the sampling function is pre-computed over 1000 equidistant points in the unit interval, and the result is shared among all threads as a look-up-table for interpolation. This makes all statistical sampling functions perform nearly identically at runtime (after initialization, a one time cost). This does have the minor side effect of a little loss in accuracy, but the difference is generally negligible for nearly all performance testing cases.

Computed Samples

Conversely, compute mode sampling calls the sampling function every time a sample is needed. This affords a little more accuracy, but is generally not preferable to the default interpolated mode. You’ll know if you need computed samples. Otherwise, it’s best to stick with interpolation so that you spend more time testing your target system and less time testing your data generation functions.

Input Range

All of these functions take a long as the input value for sampling. This is similar to how the unit interval (0.0,1.0) is used in mathematics and statistics, but more tailored to modern system capabilities. Instead of using the unit interval, we simply use the interval of all positive longs. This provides more compatibility with other functions in VirtData, including hashing functions.

Beta

See Wikipedia: Beta distribution

See Commons JavaDoc: BetaDistribution

int -> Beta(double: alpha, double: beta, String… mods) -> double
long -> Beta(double: alpha, double: beta, String… mods) -> double

Binomial

See Wikipedia: Binomial distribution

See Commons JavaDoc: BinomialDistribution

int -> Binomial(int: trials, double: p, String… modslist) -> int
int -> Binomial(int: trials, double: p, String… modslist) -> long
long -> Binomial(int: trials, double: p, String… modslist) -> int
long -> Binomial(int: trials, double: p, String… modslist) -> long

Cauchy

See Wikipedia: Cauchy_distribution

See Commons Javadoc: CauchyDistribution

int -> Cauchy(double: median, double: scale, String… mods) -> double
long -> Cauchy(double: median, double: scale, String… mods) -> double

ChiSquared

See Wikipedia: Chi-squared distribution

See Commons JavaDoc: ChiSquaredDistribution

int -> ChiSquared(double: degreesOfFreedom, String… mods) -> double
long -> ChiSquared(double: degreesOfFreedom, String… mods) -> double

ConstantContinuous

Always yields the same value.

See Commons JavaDoc: ConstantContinuousDistribution

int -> ConstantContinuous(double: value, String… mods) -> double
long -> ConstantContinuous(double: value, String… mods) -> double

Enumerated

Creates a probability density given the values and optional weights provided, in “value:weight value:weight …” form. The weight can be elided for any value to use the default weight of 1.0d.

See Commons JavaDoc: EnumeratedRealDistribution

int -> Enumerated(String: data, String… mods) -> double
- ex: Enumerated('1 2 3 4 5 6') - a fair six-sided die roll
- ex: Enumerated('1:2.0 2 3 4 5 6') - an unfair six-sided die roll, where 1 has probability mass 2.0, and everything else has only 1.0
long -> Enumerated(String: data, String… mods) -> double
- ex: Enumerated('1 2 3 4 5 6') - a fair 6-sided die
- ex: Enumerated('1:2.0 2 3 4 5:0.5 6:0.5') - an unfair fair 6-sided die, where ones are twice as likely, and fives and sixes are half as likely

Exponential

See Wikipedia: Exponential distribution

See Commons JavaDoc: ExponentialDistribution

int -> Exponential(double: mean, String… mods) -> double
long -> Exponential(double: mean, String… mods) -> double

F

See Wikipedia: F-distribution

See Commons JavaDoc: FDistribution

See Mathworld: F-Distribution

int -> F(double: numeratorDegreesOfFreedom, double: denominatorDegreesOfFreedom, String… mods) -> double
long -> F(double: numeratorDegreesOfFreedom, double: denominatorDegreesOfFreedom, String… mods) -> double

Gamma

See Wikipedia: Gamma distribution

See Commons JavaDoc: GammaDistribution

int -> Gamma(double: shape, double: scale, String… mods) -> double
long -> Gamma(double: shape, double: scale, String… mods) -> double

Geometric

See Wikipedia: Geometric distribution

See Commons JavaDoc: GeometricDistribution

int -> Geometric(double: p, String… modslist) -> int
int -> Geometric(double: p, String… modslist) -> long
long -> Geometric(double: p, String… modslist) -> int
long -> Geometric(double: p, String… modslist) -> long

Gumbel

See Wikipedia: Gumbel distribution

See Commons JavaDoc: GumbelDistribution

int -> Gumbel(double: mu, double: beta, String… mods) -> double
long -> Gumbel(double: mu, double: beta, String… mods) -> double

Hypergeometric

See Wikipedia: Hypergeometric distribution

See Commons JavaDoc: HypergeometricDistribution

int -> Hypergeometric(int: populationSize, int: numberOfSuccesses, int: sampleSize, String… modslist) -> int
int -> Hypergeometric(int: populationSize, int: numberOfSuccesses, int: sampleSize, String… modslist) -> long
long -> Hypergeometric(int: populationSize, int: numberOfSuccesses, int: sampleSize, String… modslist) -> int
long -> Hypergeometric(int: populationSize, int: numberOfSuccesses, int: sampleSize, String… modslist) -> long