Statistical Inference Software SIP

Statistical inference is always based on observations from the phenomenon under consideration. The second necessary component is a statistical model. The statistical model is based on the assumption that the observations contain random variation, that is, can be considered to have arisen from some probability distribution. Statistical inference concerns some unknown characteristic or characteristics of the phenomenon from which the observations have arisen. The characteristics of the phenomenon under consideration are some functions of the parameter of the statistical model and are called the parameter functions of interest. Statistical inference consists of statements concerning the unknown values of the interest functions. Statistical inference differs from other possible modes of inferences in that it always gives measures of uncertainties of the statements made. These measures of uncertainties are a necessary component of statistical inference and arise because of the assumed randomness contained in the observations. They describe the reliability of the inference. UP

How does Statistical Inference Package SIP work?

Statistical inference starts with empirical data, which consists of units that have been selected into the empirical study. The dataset contains one or more statistical variables which give for statistical units values of some properties of the units. The first task in the analysis is to decide which statistical variable or variables forms response or responses, respectively. The response is the statistical variable whose distribution or properties of the distribution are of interest in the empirical study. After response has been selected one has to decide which kind of statistical model is used to analyze the data, that is, which kind of distribution is used to model the random variation in response. The most important aspect to attend to in deciding about the statistical model is dependence between the statistical units. Usually the chosen statistical model depends on other statistical variables in data. The statistical models in SIP are so called parametric statistical models. This means that the distribution of the response is assumed to have some known functional form with finite number of real valued unknown quantities called parameters. Statistical inference then concerns the unknown values of the parameters or more generally the unknown values of some real valued functions of the parameters. These functions are called interest functions. Given a real valued interest function of the parameters the problem is to find those values which are supported by the statistical evidence, that is, by observed values of response and it's statistical model. The SIP function ProfileInterval calculates the so called profile likelihood based confidence interval for any smooth interest function and statistical model. In SIP the function LRTest is used to perform significance tests. The result of the function is the observed significance level that measures the risk in making the statement that the statistical hypothesis is 'false', that is, making the statement that the belief of the researcher is 'true'. UP

What is a statistical model?

Statistical model describes the random variation in response and usually depends on other statistical variables in the dataset. One component, parameter, of the statistical model consists of the unknown aspects of the phenomenon under consideration. In SIP there are four main types of models: basic models, regression models, models for stochastic processes, and hierarchical models. The package contains a large collection functions for constructing statistical models. In these functions the building blocks from which the statistical model is constructed can be any statistical models. UP

What is the observed likelihood function?

The observed likelihood function based on observed response and it's statistical model is a function of the parameter of the statistical model. The value of that function for a given value of the parameter is equal to the probability of the observed response with respect to the probability distribution of the model defined by that given parameter value. UP

What is the maximum likelihood estimate?

The maximum likelihood estimate of the model parameter is the value of the parameter that gives maximum value to the observed likelihood function. UP

What is the profile likelihood function?

In case a real-valued function of the parameter is the main interest in a study, that function is called interest function. The observed profile likelihood function of a given interest function for a given value of that interest function is the constrained maximum of the observed likelihood function under the constraint that the value of interest function for a parameter value is the given value. UP

What is a profile likelihood based confidence interval?

The profile likelihood based confidence interval for the unknown value of some real-valued interest function is a set of possible values of the interest function. The set contains all those possible values of the interest function for which the value of the observed profile likelihood function of the interest function is at least as large as some given constant. The appropriate constant depends on the required confidence level of the confidence interval. UP

What is the confidence level of a confidence interval?

The confidence level is a measure of reliability of the statement that the unknown value of the interest function belongs to the actual computed interval or in other words one minus the confidence level measures the risk involved in making the statement that the unknown value of the interest function belongs to the actual computed interval. UP

What is a likelihood ratio test?

A statistical hypothesis is an assumption concerning the unknown parameter of the statistical model. The statistical hypothesis is a kind of opposite to the research hypothesis. The likelihood ratio test is a statistical significance test that is used to measure the amount of evidence in response and it's statistical model against the statistical hypothesis and in favor of the research hypothesis. The likelihood ratio test is based on the observed value and distribution under the statistical hypothesis of the likelihood ratio statistic, which is the ratio of the maximum values of the observed likelihood function over all possible values and those satisfying the statistical hypothesis. UP

What is the observed significance level of a likelihood ratio test?

The observed significance level measures the risk in making the statement that the statistical hypothesis is 'false', that is, making the statement that the belief of the researcher is 'true'. The observed significance level thus tells whether there is in the response and model enough evidence to support the claim of the researcher. UP

What is a sampling model?

Often the data consists of a response vector, whose components are assumed to be independent observations from same model. In SIP the sampled model can be any statistical model that can be defined using the package. Thus in addition to samples from any distribution one can define sampling models from any other model, for example, from regression, stochastic process and mixture models. UP

What is an independence model?

In the case of a sampling model response vector consists of statistically independent components, which have the same model. A generalization is a statistical model for a response vector, which also consists of statistically independent components but with different models. Usually the different models have the same form with their own parameters, although some parameters might be common. In SIP, however, it is possible to define statistical models for response vectors with statistically independent components and models of different form. The set of independence models contains so-called ANOVA-type models. UP

What is a submodel?

Sometimes it is necessary to consider a statistical model the parameter space of which is a subset of the parameter space of a given statistical model but the model has otherwise the same form. In SIP user can define very complicated submodels for any statistical model that is available in the package. UP

What is a regression model?

Extremely common situation in empirical research is the case where response or responses in observed data depend systematically on some explaining variables but are not completely determined by the values of the explaining variables and thus contain additional randomness. Regression models is a large class of statistical models suitable for these situations. UP

What is a Markov chain model?

Assume that response in observed data consists of a vector whose components are observations from some process so that the order of observations is important. Assume also that the components belong to a finite set, so-called state space of the process, and that the conditional distribution of the state of the process given all the previous states depends only on the previous m states. Finally, if the conditional distribution is independent of the component considered, the process is called "time-homogeneous" Markov chain of order m. UP

What is a mixture model?

Assume that observation arises in two stages such that in the first stage a random unobservable outcome from some discrete model with unknown point probabilities is generated. Then, in the second stage, the actual observation arises from a conditional statistical model which depends on the outcome of the first stage. The model of the actual observation is thus a finite mixture of the conditional models. Usually the members of the finite mixture have the same form with their own parameters, although some parameters might be common. In SIP, however, it is possible to define mixture models, whose component models have different functional form. UP

What is a hidden Markov model?

Assume that response consists of a vector whose components are observations from some process so that the order of observations is important. Assume also that the response arises in two stages so that in the first stage an unobservable random vector, which has as many components as the response and whose components belong to a finite set, so-called state space of the hidden process, are generated, and that the conditional distribution of the state of the hidden process given all the previous states depends only on the previous state. Thus the hidden process is a time-homogeneous Markov chain of order 1. Then, in the second stage, the actual observations are generated so that the components of the response vector are conditionally independent given the hidden states and the distribution of an observation given the hidden states, so-called emission model, depends only on the hidden state corresponding the observation. Hidden Markov model (HMM) is a statistical model for this kind of observation. UP

Can confidence intervals calculated to nonlinear parameter functions?

Yes, they can. In SIP profile likelihood based confidence interval can be calculated for any linear or smooth nonlinear interest function of parameters. UP

Can likelihood ratio tests calculated to nonlinear hypotheses?

Yes, they can. SIP can handle statistical hypotheses corresponding restricted statistical models defined by any linear or smooth nonlinear functions of the parameters. UP

Can properties of statistical models be calculated in symbolic form?

Yes, they can. Almost all properties for almost all statistical models can be calculated in symbolic form. Statistical inferences, however, are always calculated from the actual dataset and thus they are always numerical. UP

What do I need to run SIP, and how do I order it?

Statistical Inference Package SIP requires Mathematica 5.0 or higher. The software can be purchased from the Wolfram Research by downloading it from (link will be added soon). UP

Where can I get help if I have technical questions about SIP?

Support is available from the developer, who can be contacted by email at support@statisticalinference.com. UP