STATISTICA Base
Features of STATISTICA Base
|
STATISTICA Base (a stand-alone product) - offers a comprehensive set of essential statistics in a user-friendly package and all the performance,
power, and ease of use of the STATISTICA technology. |
|
STATISTICA Base is compatible with Windows 2000, Windows XP and Windows Vista. It features the following modules:
DESCRIPTIVE STATISTICS, BREAKDOWNS, AND EXPLORATORY DATA ANALYSIS. STATISTICA Base offers a wide selection of methods for
exploratory analyses:
Descriptive Statistics and Graphs. The program will compute practically all common, general-purpose descriptive statistics
including medians, modes, quartiles, user-specified percentiles, average and standard deviations, quartile ranges, confidence limits
for the mean, skewness and kurtosis (with their respective standard errors), harmonic means, geometric means, as well as many
specialized descriptive statistics and diagnostics, either for all cases or broken down by one or more categorical (grouping)
variables. As with all modules of STATISTICA, a wide variety of graphs will aid exploratory analyses, e.g., various types
of box-and-whisker plots, histograms, bivariate distribution (3D or categorized) histograms, 2D and 3D scatterplots with marked
subsets, normal, half-normal, detrended probability plots, Q-Q plots, P-P plots, etc. A selection of tests is available for fitting
the normal distribution to the data (via the Kolmogorov-Smirnov, Lilliefors, and Shapiro-Wilks' tests; facilities for fitting a
wide variety of other distributions are also available; see also STATISTICA Process Analysis; and the section on fitting in
the Graphics section).
By-Group Analyses (Breakdowns). Practically all descriptive statistics as well as summary graphs can be computed for data
that are categorized (broken down) by one or more grouping variables. For example, with just a few mouse clicks the user can break
down the data by Gender and Age and review categorized histograms, box-and-whisker plots, normal probability plots, scatterplots,
etc. If more than two categorical variables are chosen, cascades of the respective graphs can be automatically produced. Options
to categorize by continuous variables are provided, e.g., you can request that a variable be split into a requested number of
intervals, or use the on-line recode facility to custom-define the way in which the variable will be recoded (categorization options
of practically unlimited complexity can be specified at any point and they can reference relations involving all variables in the
dataset). In addition, a specialized hierarchical breakdown procedure is provided that allows the user to categorize the data by
up to six categorical variables, and compute a variety of categorized graphs, descriptive statistics, and correlation matrices for
subgroups (the user can interactively request to ignore some factors in the complete breakdown table, and examine statistics for
any marginal tables). Numerous
formatting and labeling options
allow the user to produce publication-quality tables and reports with long labels and descriptions of variables.
Note that extremely large analysis designs can be specified in the breakdown procedure (e.g., 100,000 groups for a single
categorization variable), and results include all relevant ANOVA statistics (including the complete ANOVA table, tests of
assumptions such as the Levene and Brown-Forsythe tests for homogeneity of variance, a selection of seven post-hoc tests, etc.).
As in all other modules of STATISTICA, extended precision calculations (the "quadruple" precision, where applicable) are
used to provide an unmatched level of accuracy (see the section on Precision). Because of the interactive nature of the program,
exploration of data is very easy. For example, exploratory graphs can be produced directly from all results Spreadsheets by pointing
with the mouse to specific cells or ranges of cells. Cascades of even complex (e.g., multiple categorized) graphs can be produced
with a single-click of the mouse and reviewed in a slide-show manner. In addition to numerous predefined statistical graphs,
countless graphical visualizations of raw data, summary statistics, relations between statistics, as well as all breakdowns and
categorizations can be custom-defined by the user via straightforward point-and-click facilities designed to reduce the necessary
number of mouse clicks. All exploratory graphical techniques (described in the section on Graphics) are integrated with statistics
to facilitate graphical data analyses (e.g., via interactive outlier removal, subset selections, smoothing, function fitting,
extensive brushing options allowing the user to easily identify and/or extract the selected data, etc.). See also the section on
Block Statistics, below.

CORRELATIONS. A comprehensive set of options allows for the exploration of correlations and partial correlations between
variables. First, practically all common measures of association can be computed, including Pearson r, Spearman rank order R,
Kendall tau (b, c), Gamma, tetrachoric r, Phi, Cramer V, contingency coefficient C, Sommer's D, uncertainty coefficients, part and
partial correlations, autocorrelations, various distance measures, etc. (nonlinear regressions, regressions for censored data and
other specialized measures of correlations are available in Nonlinear Estimation,
Survival Analysis, and other modules offered in STATISTICA Advanced Linear/Non-Linear Models). Correlation
matrices can be computed using casewise (listwise) or pairwise deletion of missing data, or mean substitution. As in all other
modules of STATISTICA, extended precision calculations (the "quadruple" precision, where applicable) are used to yield an
unmatched level of accuracy (see the section on Precision). Like all other results in STATISTICA, correlation matrices are
displayed in Spreadsheets offering various formatting options (see below) and extensive facilities to visualize numerical results;
the user can "point to" a particular correlation in the Spreadsheet and choose to display a variety of "graphical summaries" of
the coefficient (e.g., scatterplots with confidence intervals, various 3D bivariate distribution histograms, probability
plots, etc.).
Brushing and outlier detection. The extensive brushing facilities in the scatterplots allow the user to select/deselect individual
points in the plot and assess their effect on the regression line (or other fitted function lines).
Display formats of numbers. A variety of global display formats for correlations are supported; significant correlation
coefficients can be automatically highlighted, each cell of the Spreadsheet can be expanded to display n and p, or detailed results
may be requested that include all descriptive statistics (pairwise means and standard deviations, B weights, intercepts, etc.).
Like all other numerical results, correlation matrices are displayed in Spreadsheets offering the zoom option and
interactively-controlled display formats (e.g., from +.4 to +.4131089276410193); thus, large matrices can be compressed (via either
the zoom or format-width control adjustable by dragging) to facilitate the visual search for coefficients which exceed a
user-specified magnitude or significance level (e.g., the respective cells can be marked red in the Spreadsheet).
Scatterplot, scatterplot matrices, by-group analyses. As in all output selection dialogs, numerous global graphics options are available to further study patterns of relationships between variables, e.g., 2D and 3D scatterplots (with or without case labels) designed to identify
patterns of relations across subsets of cases or series of variables. Correlation matrices can be computed as categorized by
grouping variables and visualized via categorized scatterplots. Also "breakdowns of correlation matrices" can be generated (one
matrix per subset of data), displayed in queues of Spreadsheets, and saved as stacked correlation matrices (which can later be used
as input into the Structural Equations Modeling and Path Analysis [SEPATH] module
offered in STATISTICA Advanced Linear/Non-Linear Models). An entire correlation matrix can be summarized
in a single graph via the Matrix scatterplot option (of practically unlimited density); large scatterplot matrices can then be
reviewed interactively by "zooming in" on selected portions of the graph (or scrolling large graphs in the zoom mode) [see the
illustration]. Also, categorized scatterplot matrix plots can be generated (one matrix plot for each subset of data). Alternatively,
a multiple-subset scatterplot matrix plot can be created where specific subsets of data (e.g., defined by levels of a grouping
variable or selection conditions of any complexity) are marked with distinctive point markers. Various other graphical methods can
be used to visualize matrices of correlations in search of global patterns (e.g., contour plots, non-smoothed surfaces, icons,
etc.). All of these operations require only a few mouse clicks and various shortcuts are provided to simplify selections of
analyses; any number of Spreadsheets and graphs can be displayed simultaneously on the screen, making interactive exploratory
analyses and comparisons very easy.
|

BASIC STATISTICS FROM RESULTS SPREADSHEETS (TABLES). STATISTICA is a single integrated analysis system that presents
all numerical results in spreadsheet tables that are suitable (without any further modification) for input
into subsequent analyses. Thus,
basic statistics (or any other statistical analysis) can be computed for results tables from previous analyses; for example, you
could very quickly compute a table of means for 2000 variables, and next use this table as an input data file to further analyze the
distribution of those means across the variables. Thus, basic statistics are available at any time during your analyses, and can be
applied to any results spreadsheet.
Block Statistics. In addition to the detailed descriptive statistics that can be computed for every spreadsheet, you can also
highlight blocks of numbers in any spreadsheet, and produce basic descriptive statistics or graphs for the respective subset of
numbers only. For example, suppose you computed a results spreadsheet with measures of central tendency for 2000 variables (e.g.,
with Means, Modes, and Medians, Geometric Means, and Harmonic Means); you could highlight a block of, for example, 200 variables and
the Means and Medians, and then in a single operation produce a multiple line graph of those two measures across the subset of 200
variables. Statistical analysis by blocks can be performed by row or by column; for example, you could also compute a multiple line
graph for a subset of variables across the different measures of central tendency. To summarize, the block statistics facilities
allow you to produce statistics and statistical graphs from values in arbitrarily selected (highlighted) blocks of values in the
current data spreadsheet or output Spreadsheet.
|
INTERACTIVE PROBABILITY CALCULATOR. A flexible, interactive Probability Calculator is accessible from all toolbars. It
features a wide selection of distributions (including Beta, Cauchy, Chi-square, Exponential, Extreme value, F, Gamma, Laplace,
Lognormal, Logistic, Pareto, Rayleigh, t (Student), Weibull, and Z (Normal)); interactively (in-place) updated graphs built into
the dialog (a plot of the density and distribution functions) allow the user to visually explore distributions taking advantage of
the flexible STATISTICA Smart MicroScrolls which allow the user to advance either the last significant digit (press the
LEFT-mouse-button) or next to the last significant digit (press the RIGHT-mouse-button). Facilities are provided for generating
customizable, compound graphs of distributions with requested cutoff areas. Thus, this calculator allows you to interactively
explore the distributions (e.g., the respective probabilities depending on shape parameters).
|
t-TESTS and Other Tests of Group Differences. T-tests for dependent and independent samples, as well as single
samples (testing means against user-specified constants) can be computed, multivariate Hotelling's T 2 tests are also available
(see also ANOVA/MANOVA, and GLM (General Linear Models) offered in STATISTICA Advanced Linear/Non-Linear Models. Flexible options are provided to allow comparisons between variables (e.g.,
treating the data in each column of the input spreadsheet as a separate sample) and coded groups (e.g., if the data includes a
categorical variable such as Gender to identify group membership for each case). As with all procedures, extensive diagnostics and
graphics options are available from the results menus. For example, for the t-test for independent samples, options are provided to
compute t-tests with separate variance estimates, Levene and Brown-Forsythe tests for homogeneity of variance, various
box-and-whisker plots, categorized histograms and probability plots, categorized scatterplots, etc. Other (more specialized) tests
of group differences are part of many modules (e.g., Nonparametrics (below), Survival Analysis (available in STATISTICA Advanced Linear/Non-Linear Models), Reliability/Item Analysis (available in STATISTICA Multivariate Exploratory Techniques).
|

MULTIPLE REGRESSION METHODS. The Multiple Regression module is a comprehensive implementation of linear regression
techniques, including simple, multiple, stepwise (forward, backward, or in blocks), hierarchical, nonlinear (including polynomial,
exponential, log, etc.), Ridge regression, with or without intercept (regression through the origin), and weighted least squares
models; additional advanced methods are provided in the General Regression Models (GRM) module (e.g., best subset regression,
multivariate stepwise regression
for multiple dependent variables, for models that may include categorical factor effects; statistical summaries for validation and
prediction samples, custom hypotheses, etc.). The Multiple Regression module will calculate a comprehensive set of statistics and
extended diagnostics including the complete regression table (with standard errors for B, Beta and intercept, R-square and adjusted
R-square for intercept and non-intercept models, and ANOVA table for the regression), part and partial correlation matrices,
correlations and covariances for regression weights, the sweep matrix (matrix inverse), the Durbin-Watson d statistic, Mahalanobis
and Cook's distances, deleted residuals, confidence intervals for predicted values, and many others.
Predicted and residual values. The extensive residual and outlier analysis features a large selection of plots, including a
variety of scatterplots, histograms, normal and half-normal probability plots, detrended plots, partial correlation plots,
different casewise residual and outlier plots and diagrams, and others. The scores for individual cases can be visualized via
exploratory icon plots and other multidimensional graphs integrated directly with the results Spreadsheets. Residual and predicted
scores can be appended to the current data file. A forecasting routine allows the user to perform what-if analyses, and to
interactively compute predicted scores based on user-defined values of predictors.
By-group analysis; related procedures. Extremely large regression designs can be analyzed. An option is also included to
perform multiple regression analyses broken down by one or more categorical variable (multiple regression analysis by group);
additional add-on procedures include a regression engine that supports models with thousands of variables, a Two-stage Least Squares
regression, as well as Box-Cox and Box-Tidwell transformations with graphs. An add-on package, STATISTICA Advanced Linear/Non-Linear Models, also includes general
nonlinear estimation modules (Nonlinear Estimation, Generalized Linear Models (GLZ),
Generalized Additive Models (GAM), Partial Least Squares models (PLS)) that can estimate practically any user-defined nonlinear model, including Logit, Probit, and others.
The add-on also includes SEPATH, the general Structural Equation Modeling and Path Analysis module, which allows the user
to analyze extremely large correlations, covariances, and moment matrices (for intercept models).
|

NONPARAMETRIC STATISTICS. The Nonparametric Statistics module features a comprehensive selection of inferential and
descriptive statistics including all common tests and some special application procedures. Available statistical procedures include
the Wald-Wolfowitz runs test, Mann-Whitney
U test (with exact probabilities [instead of the Z approximations] for small samples), Kolmogorov-Smirnov tests, Wilcoxon matched
pairs test, Kruskal-Wallis ANOVA by ranks, Median test, Sign test, Friedman ANOVA by ranks, Cochran Q test, McNemar test, Kendall
coefficient of concordance, Kendall tau (b, c), Spearman rank order R, Fisher's exact test, Chi-square tests, V-square statistic,
Phi, Gamma, Sommer's d, contingency coefficients, and others. (Specialized nonparametric tests and statistics are also part of many
add-on modules, e.g., Survival Analysis, STATISTICA Process Analysis, and others.)
All (rank order) tests can handle tied ranks and apply corrections for small n or tied ranks. The program can handle extremely large
analysis designs. As in all other modules of STATISTICA, all tests are integrated with graphs (that include various
scatterplots, specialized box-and-whisker plots, line plots, histograms and many other 2D and 3D displays).
|
ANOVA/MANOVA. The ANOVA/MANOVA module includes a subset of the functionality of the General Linear Models module (part of the Advanced Linear/Non-Linear Models add-on), and can perform univariate and multivariate analysis of variance of factorial designs with or without one repeated measures variable. For more complicated linear models with categorical and continuous predictor variables, random effects, and multiple repeated measures factors you need the General Linear Models module (stepwise and best-subset options are available in the General Regression Models module). In the ANOVA/MANOVA module, you can specify all designs in the most straightforward, functional terms of actual variables and levels (not in
technical terms, e.g., by specifying matrices of dummy codes), and even less-experienced ANOVA users can analyze very complex
designs with STATISTICA. Like the General Linear Models module, ANOVA/MANOVA provides three alternative user interfaces for
specifying designs: (1) A Design Wizard, that will take you step-by-step through the process of specifying a design, (2) a simple
dialog-based user-interface that will allow you to specify designs by selecting variables, codes, levels, and any design options
from well-organized dialogs, and (3) a Syntax Editor for specifying designs and design options using keywords and a common design
syntax. Computational methods. The program will use, by default, the sigma restricted parameterization for factorial designs, and
apply the effective hypothesis approach (see Hocking, 19810) when the design is unbalanced or incomplete. Type I, II, III, and IV
hypotheses can also be computed, as can Type V and Type VI hypotheses that will perform tests consistent with the typical analyses
of fractional factorial designs in industrial and quality-improvement applications (see also the description of the Experimental
Design module). Results statistics. The ANOVA/MANOVA module is not limited in any of its computational routines for reporting
results, so the full suite of detailed analytic tools available in the General Linear Models module is also available here (please
see the detailed description of the General Linear Models module for details); results include summary ANOVA tables, univariate
and multivariate results for repeated measures factors with more than 2 levels, the Greenhouse-Geisser and Huynh-Feldt adjustments,
plots of interactions, detailed descriptive statistics, detailed residual statistics, planned and post-hoc comparisons, testing of
custom hypotheses and custom error terms, detailed diagnostic statistics and plots (e.g., histogram of within-cell residuals,
homogeneity of variance tests, plots of means versus standard deviations, etc.).
|

DISTRIBUTION FITTING. The Distribution Fitting options allow the user to compare the distribution of a variable with a
wide variety of theoretical distributions. You may fit to the data the Normal, Rectangular, Exponential, Gamma, Lognormal,
Chi-square, Weibull, Gompertz, Binomial,
Poisson, Geometric, or Bernoulli distribution.
The fit can be evaluated via the Chi-square test or the Kolmogorov-Smirnov one-sample test (the fitting parameters can be
controlled); the Lilliefors and Shapiro-Wilks' tests are also supported (see above). In addition, the fit of a particular
hypothesized distribution to the empirical distribution can be evaluated in customized histograms (standard or cumulative) with
overlaid selected functions; line and bar graphs of expected and observed frequencies, discrepancies and other results can be
produced from the output Spreadsheets.
Other distribution fitting options are available in STATISTICA Process Analysis, where the user can compute
maximum-likelihood parameter estimates for the Beta, Exponential, Extreme Value (Type I, Gumbel), Gamma, Log-Normal, Rayleigh, and
Weibull distributions.
Also included in that module are options for automatically selecting and fitting the best distribution for the data, as well as
options for general distribution fitting by moments (via Johnson and Pearson curves). User-defined 2- and 3-dimensional functions
can also be plotted and overlaid on the graphs. The functions may reference a wide variety of distributions such as the Beta,
Binomial, Cauchy, Chi-square, Exponential, Extreme value, F, Gamma, Geometric, Laplace, Logistic, Normal, Log-Normal, Pareto,
Poisson, Rayleigh, t (Student), or Weibull distribution, as well as their integrals and inverses. Additional facilities to fit
predefined or user-defined functions of practically unlimited complexity to the data are available in
Nonlinear Estimation (available in STATISTICA Advanced Linear/Non-Linear Models).
|