 VARIANCE COMPONENTS AND MIXED MODEL ANOVA/ANCOVA.
Variance Components and Mixed Model ANOVA/ANCOVA. is a specialized module for designs with random effects and/or factors
with many levels; options for handling random effects and for estimating variance components are also provided in the
General Linear Models module. Random effects (factors)occur frequently in industrial research, when the levels of a factor
represent values sampled from a random variable (as opposed to being deliberately chosen or arranged by the experimenter).
The Variance Components module will allow you to analyze designs with any combinations of fixed effects, random effects, and
covariates. Extremely large ANOVA/ANCOVA designs can be efficiently analyzed: Factors can have several hundreds of levels.
The program will analyze standard factorial (crossed) designs as well as hierarchically nested designs, and compute the standard
Type I, II, and III analysis of variance sums of squares and mean squares for the effects in the model. In addition, you can
compute the table of expected mean squares for the effects in the design, the variance components for the random effects in the
model, the coefficients for the denominator synthesis, and the complete ANOVA table with tests based on synthesized error sums of
squares and degrees of freedom (using Satterthwaite's method).
Other methods for estimating variance components are also supported (e.g., MIVQUE0, Maximum Likelihood [ML], Restricted Maximum
Likelihood [REML]). For maximum likelihood estimation, both the Newton-Raphson and Fisher scoring algorithms are used, and the
model will not be arbitrarily changed (reduced) during estimation to handle situations where most components are at or near zero.
Several options for reviewing the weighted and unweighted marginal means, and their confidence intervals, are also available.
Extensive graphics options can be used to visualize the results.
 SURVIVAL/FAILURE TIME ANALYSIS.
This module features a comprehensive implementation of a variety of techniques for analyzing censored data from social,
biological, and medical research, as well as procedures used in engineering and marketing (e.g., quality control,
reliability estimation, etc.). In addition to computing life tables with various descriptive statistics and Kaplan-Meier
product limit estimates, the user can compare the survivorship functions in different groups using a large selection of
methods (including the Gehan test, Cox F-test, Cox-Mantel test, Log-rank test, and Peto & Peto generalized Wilcoxon test).
Also, Kaplan-Meier plots can be computed for groups (uncensored observations are identified in graphs with different point markers).
The program also features a selection of survival function fitting procedures (including the Exponential, Linear Hazard,
Gompertz, and Weibull functions) based on either unweighted and weighted least squares methods (maximum-likelihood parameter
estimates for various distributions, including Weibull, can also be computed via the STATISTICA Process Analysis module).
Finally, the program offers full implementations of four general explanatory models (Cox's proportional hazard model,
exponential regression model, log-normal and normal regression models) with extended diagnostics, including stratified
analysis and graphs of survival for user-specified values of predictors. For Cox proportional hazard regression, the user
can choose to stratify the sample to permit different baseline hazards in different strata (but a constant coefficient vector),
or the user can allow for different baseline hazards as well as coefficient vectors.
In addition, general facilities are provided to define one or more time-dependent covariates. Time-dependent covariates can be
specified via a flexible formula interpreter that allows the user to define the covariates via arithmetic expressions which may
include time, as well as the standard logical functions (e.g., timdep=age+age*log(t_)*(age>45), where t_ references survival time)
and a wide variety of distribution functions. As in all other modules of STATISTICA, the user can access and change the technical
parameters of all procedures (or accept dynamic defaults).
The module also offers an extensive selection of graphics and specialized diagrams to aid in the interpretation of results
(including plots of cumulative proportions surviving/failing, patterns of censored data, hazard and cumulative hazard functions,
probability density functions, group comparison plots, distribution fitting plots, various residual plots, and many others). For
engineering applications, see also Weibull Analysis.

GENERAL NONLINEAR ESTIMATION (and Quick Logit/Probit Regression).
The Nonlinear Estimation module allows the user to fit essentially any type of nonlinear model. One of the unique features of this
module is that (unlike traditional nonlinear estimation programs) it does not impose any limits on the size of data files that it
can process.
Estimation Methods. The models can be fit using least squares or maximum-likelihood estimation, or any user-specified loss
function. When using the least-squares criterion, the very efficient Levenberg-Marquardt and Gauss-Newton algorithms can be used
to estimate the parameters for arbitrary linear and nonlinear regression problems. For large datasets or for difficult nonlinear
regression problems (such as those rated "higher difficulty" among the Statistical Reference Datasets provided by the National
Institute of Standards and Technology; see http://www.nist.gov/itl/div898/strd/index.html), when using the least-squares criterion,
this is the recommended method for computing precise parameter estimates.
When using arbitrary loss functions, the user can choose from among four very different, powerful estimation procedures
(quasi-Newton, Simplex, Hooke-Jeeves pattern moves, and Rosenbrock pattern search method of rotating coordinates) so that stable
parameter estimates can be obtained in practically all cases, and even in extremely numerically-demanding conditions (see the
Validation Benchmarks ).
Models. The user can specify any type of model by typing in the respective equation into an equation editor. The equations
may include logical operators; thus, discontinuous (piecewise) regression models and models including indicator variables can also
be estimated. The equations may also include a wide selection of distribution functions and cumulative distribution functions (Beta,
Binomial, Cauchy, Chi-square, Exponential, Extreme value, F, Gamma, Geometric, Laplace, Logistic, Normal, Log-Normal, Pareto,
Poisson, Rayleigh, t (Student), or Weibull distribution). The user has full control over all aspects of the estimation procedure
(e.g., starting values, step sizes, convergence criteria, etc.). The most common nonlinear regression models are predefined in the
Nonlinear Estimation module, and can be chosen simply as menu options. Those regression models include stepwise Probit and Logit
regression, the exponential regression model, and linear piecewise (break point) regression. Note that STATISTICA also
includes implementations of powerful algorithms for fitting generalized linear models, including probit and multinomial logit
models, and generalized additive models; see the respective descriptions for additional details.
Results. In addition to various descriptive statistics, standard results of the nonlinear estimation include the parameter
estimates and their standard errors (computed independently of the estimation itself, via finite differencing to optimize precision;
see the Validation Benchmarks ); the variance/covariance matrix of parameter estimates, the predicted values, residuals, and
appropriate measures of goodness-of-fit (e.g., log-likelihood of estimated/null models and Chi-square test of difference, proportion
of variance accounted for, classification of cases and odds-ratios for Logit and Probit models, etc.).
Predicted and residual values can be appended to the data file for further analyses. For Probit and Logit models, the incremental
fit is also automatically computed when adding or deleting parameters from the regression model (thus, the user can explore the
data via a stepwise nonlinear estimation procedure; options for automatic forward and backward stepwise regression as well as
best-subset selection of predictors in logit and probit models is provided in the Generalized Linear Models module, below).
Graphs. All output is integrated with extensive selections of graphs, including interactively-adjustable 2D and 3D (surface)
arbitrary function fitting graphs which allow the user to visualize the quality of the fit and identify outliers or ranges of
discrepancy between the model and the data; the user can interactively adjust the equation of the fitted function (as shown in the
graph) without re-processing the data and visualize practically all aspects of the nonlinear fitting process). Many other
specialized graphs are provided to evaluate the fitting process and visualize the results, such as histograms of all selected
variables and residual values, scatterplots of observed versus predicted values and predicted versus residual values, normal and
half-normal probability plots of residuals, and many others.
LOG-LINEAR ANALYSIS OF FREQUENCY TABLES.
This module offers a complete implementation of log-linear modeling procedures for multi-way frequency tables. Note that
STATISTICA also includes the Generalized Linear Models module, which provides options for analyzing binomial and multinomial
logit models with coded ANOVA/ANCOVA-like designs.
In the Log-Linear Analysis module, the user can analyze up to 7-way tables in a single run. Both complete and incomplete tables
(with structural zeros) can be analyzed. Frequency tables can be computed from raw data, or may be entered directly into the program.
The Log-Linear Analysis module provides a comprehensive selection of advanced modeling procedures in an interactive and flexible environment
that greatly facilitates exploratory and confirmatory analyses of complex tables. The user may at all times review the complete observed
table as well as marginal tables, and fitted (expected) values, and may evaluate the fit of all partial and marginal association models
or select specific models (marginal tables) to be fitted to the observed data.
The program also offers an intelligent automatic model selection procedure that first determines the necessary order of interaction
terms required for a model to fit the data, and then, through backwards elimination, determines the best sufficient model to
satisfactorily fit the data (using criteria determined by the user).
The standard output includes G-square (Maximum-Likelihood Chi-square), the standard Pearson Chi-square with the appropriate
degrees of freedom and significance levels, the observed and expected tables, marginal tables, and other statistics. Graphics
options available in the Log-linear module include a variety of 2D and 3D graphs designed to visualize 2-way and multi-way
frequency tables (including interactive, user-controlled cascades of categorized histograms and 3D histograms revealing "slices"
of multi-way tables), plots of observed and fitted frequencies, plots of various residuals (standardized, components of
Maximum-Likelihood Chi-square, Freeman-Tukey deviates, etc.), and many others.
 TIME SERIES ANALYSIS/FORECASTING.
The Time Series module contains a wide range of descriptive, modeling, decomposition, and forecasting methods for both time and
frequency domain models. These procedures are integrated, that is, the results of one analysis (e.g., ARIMA residuals) can be used
directly in subsequent analysis (e.g., to compute the autocorrelation of the residuals).
Also, numerous flexible options are provided to review and plot single or multiple series. Analyses can be performed on even very
long series. Multiple series can be maintained in the active work area of the program (e.g., multiple raw input data series or
series resulting from different stages of the analysis); the series can be reviewed and compared. The program will automatically
keep track of successive analyses, and maintain a log of transformations and other results (e.g., ARIMA residuals, seasonal
components, etc.). Thus, the user can always return to prior transformations or compare (plot) the original series together with
its transformations. Information about the consecutive transformations is maintained in the form of long variable labels, so if you
save the newly created variables into a dataset, the "history" of each of the series will be permanently preserved. The specific
Time Series procedures are described in the following subsections.
Transformations, Modeling, Plots, Autocorrelations. The available time series transformations allow the user to fully
explore patterns in the input series, and to perform all common time series transformations, including: de-trending, removal of
autocorrelation, moving average smoothing (unweighted and weighted, with user-defined or Daniell, Tukey, Hamming, Parzen, or
Bartlett weights), moving median smoothing, simple exponential smoothing (see also the description of all exponential smoothing
options below), differencing, integrating, residualizing, shifting, 4253H smoothing, tapering, Fourier (and inverse)
transformations, and others. Autocorrelation, partial autocorrelation, and crosscorrelation analyses can also be performed.
ARIMA and Interrupted Time Series (Intervention) Analysis. The Time Series module offers a complete implementation of ARIMA.
Models may include a constant, and the series can be transformed prior to the analysis; these transformations will automatically be
"undone" when ARIMA forecasts are computed, so that the forecasts and their standard
errors are expressed in terms of the values of the original input series. Approximate and exact maximum-likelihood conditional sums
of squares can be computed, and the ARIMA implementation in the Time Series module is uniquely suited to fitting models with long
seasonal periods (e.g., periods of 30 days). Standard results include the parameter estimates and their standard errors and the
parameter correlations. Forecasts and their standard errors can be computed and plotted, and appended to the input series. In
addition, numerous options for examining the ARIMA residuals (for model adequacy) are available, including a large selection of
graphs. The implementation of ARIMA in the Time Series module also allows the user to perform interrupted time series (intervention)
analysis. Several simultaneous interventions may be modeled, which can either be single-parameter abrupt-permanent interventions,
or two-parameter gradual or temporary interventions (graphs of different impact patterns can be reviewed). Forecasts can be
computed for all intervention models, which can be plotted (together with the input series) as well as appended to the original
series.
Seasonal and Non-Seasonal Exponential Smoothing. The Time Series module contains a complete implementation of all 12 common
exponential smoothing models. Models can be specified to contain an additive or multiplicative seasonal component and/or linear,
exponential, or damped trend; thus, available models include the popular Holt-Winter linear trend models.
The user may specify the initial value for the smoothing transformation, initial trend value, and seasonal factors (if appropriate).
Separate smoothing parameters can be specified for the trend and seasonal components. The user can also perform a grid search of
the parameter space in order to identify the best parameters; the respective results spreadsheet will report for all combinations
of parameter values the mean error, mean absolute error, sum of squares error, mean square error, mean percentage error, and mean
absolute percentage error. The smallest value for these fit indices will be highlighted in the spreadsheet. In addition, the user
can also request an automatic search for the best parameters with regard to the mean square error, mean absolute error, or mean
absolute percentage error (a general function minimization procedure is used for this purpose). The results of the respective
exponential smoothing transformation, the residuals, as well as the requested number of forecasts, are available for further
analyses and plots. A summary plot is also available to assess the adequacy of the respective exponential smoothing model; that
plot will show the original series together with the smoothed values and forecasts, as well as the smoothing residuals plotted
separately against the right-Y axis.
Classical Seasonal Decomposition (Census Method I). The user may specify the length of the seasonal period, and choose
either the additive or multiplicative seasonal model. The program will compute the moving averages, ratios or differences, seasonal
factors, the seasonally adjusted series, the smoothed trend-cycle component, and the irregular component. Those components are
available for further analysis; for example, the user may compute histograms, normal probability plots, etc. for any or all of
these components (e.g., to test model adequacy).
X-11 Monthly and Quarterly Seasonal Decomposition and Seasonal Adjustment (Census Method II). The Time Series module
contains a full-featured implementation of the US Bureau of the Census X-11 variant of the Census Method II seasonal adjustment
procedure. While the original X-11 algorithms were not year-2000 compatible (only data prior to January 2000 could be analyzed),
the STATISTICA implementation of X11 can handle data containing dates prior to January 1, 2000, after that date, or series
that will start prior to that date but terminate in or after the year 2000. The arrangement of options and dialogs closely follows
the definitions and conventions described in the Bureau of the Census documentation. Additive and multiplicative seasonal models
may be specified. The user may also specify prior trading-day factors and seasonal adjustment factors. Trading-day variation can be
estimated via regression (controlling for extreme observations), and used to adjust the series (conditionally if requested). The
standard options are provided for graduating extreme observations, for computing the seasonal factors, and for computing the
trend-cycle component (the user can choose between various types of weighted moving averages; optimal lengths and types of moving
averages can also automatically be chosen by the program). The final components (seasonal, trend-cycle, irregular) and the
seasonally adjusted series are automatically available for further analyses and plots; those components can also be saved for
further analyses with other programs. The program will produce the plots of the different components, including categorized plots
by months (or quarters).
Polynomial Distributed Lag Models. The implementation of the polynomial distributed lag methods in the Time Series module
will estimate models with unconstrained lags as well as (constrained) Almon distributed lags models. A selection of graphs are
available to examine the distributions of the model variables.
Spectrum (Fourier) and Cross-Spectrum Analysis. The Time Series module includes a full implementation of spectrum (Fourier
decomposition) analysis and cross-spectrum analysis techniques. The program is particularly suited for the analysis of unusually
long time series (e.g., with over 250,000 observations), and it will not impose any constraints on the length of the series (i.e.,
the length of input series does not have to be a multiple of 2). However, the user may also choose to pad or truncate the series
prior to the analysis. Standard pre-analysis transformations include tapering, subtraction of the mean, and detrending. For single
spectrum analysis, the standard results include the frequency, period, sine and cosine coefficients, periodogram values, and spectral
density estimates. The density estimates can be computed using Daniell, Hamming, Bartlett, Tukey, Parzen, or user-defined weights
and user-defined window widths.
An option that is particularly useful for long input series is to display only a user-defined number of the largest periodogram or
density values in descending order; thus, the most salient periodogram or density peaks can be easily identified in long series.
The user can compute the Kolmogorov-Smirnov d test for the periodogram values to test whether they follow an exponential
distribution (i.e., whether the input is a white-noise series). Numerous plots are available to summarize the results; the user can
plot the sine and cosine coefficients, periodogram values, log-periodogram values, spectral density values, and log-density values
against the frequencies, period, or log-period. For long input series, the user can choose the segment (period) for which to plot the
respective periodogram or density values, thus enhancing the "resolution" of the periodogram or density plot.
For cross-spectrum analysis, in addition to the single spectrum results for each series, the program computes the cross-periodogram
(real and imaginary part), co-spectral density, quadrature spectrum, cross-amplitude, coherency values, gain values, and the phase
spectrum. All of these can also be plotted against the frequency, period, or log-period, either for all periods (frequencies) or
only for a user-defined segment. A user-defined number of the largest cross-periodogram values (real or imaginary) can also be
displayed in a spreadsheet in descending order of magnitude to facilitate the identification of salient peaks when analyzing long
input series.
As with all other procedures in the Time Series module, all of these result series can be appended to the active work area, and will
be available for further analyses with other time series methods or other STATISTICA modules. |