STATISTICA Enterprise-wide Data Mining System

The popularity of Data Mining methodology is rapidly growing in a wide variety of areas where specific tools are needed to make sense of ever-increasing amounts of information and to search for significant patterns and trends in large databases.

Data Mining can be used on all types of data including quality data (see Predictive Quality Control).

What is Data Mining?

Data Mining is the process of exploring and analysing large quantities of data, using various methods ranging from graphical exploratory techniques to highly specific methods.

Data Mining methods are used to extract meaningful new information from the data and fall into several categories, including description and visualisation, classification and clustering, estimation and prediction.

Applications of data mining include fraud detection, credit card scoring and personal profile marketing in order to enhance customer relations, direct marketing, trend analysis, financial market forecasting, bioinformatics and product quality control and improvement. Web mining, through which data from the Web are analysed, helps businesses understand customer "click- stream" behaviour online.

Why choose StatSoft and the STATISTICA Data Miner?

StatSoft is one of the largest producers of statistics and analytic graphics in the world and has been established in this market since 1984 (read StatSoft's history here). StatSoft is supported by a wide network of 20 international offices on all continents.

STATISTICA, StatSoft's major product line, enjoys an unprecedented record of recognition amongst users and reviewers. It has been rated FIRST in EVERY independent comparative review since its release in 1993. To read the reviews, click here. STATISTICA is available in English, Spanish, German, French and many other foreign languages.

The STATISTICA Data Miner is a comprehensive and user-friendly set of complete data mining tools, designed to enable users to easily and quickly analyse their data in order to uncover hidden trends, explain known patterns and predict the future. See below for a more detailed description of the STATISTICA Data Miner.

With over 18 years of experience in the field of data analysis, StatSoft offers not only the expertise to install and run the STATISTICA Data Miner at your site, but also to train your staff and provide consultancy services to help you make the most of the program.

Overview of the unique features of the STATISTICA Data Miner

From querying databases to generating final reports and graphs, STATISTICA Data Miner offers ease of use without sacrificing power or comprehensiveness. STATISTICA Data Miner features a wide selection of algorithms for classification, prediction, clustering and modelling, as well as an intuitive, icon-based interface.

STATISTICA Data Miner can work within a client-server architecture in order to offload time-consuming tasks from less powerful computers to dedicated servers. It also offers options to manage projects over the Web and work collaboratively across the corridor or across continents.

STATISTICA Data Miner comes with a wide selection of predefined projects and a "point-and-click" user interface, allowing users to easily build complex data mining projects without programming. The Data Miner is also fully programmable using the industry-standard STATISTICA Visual Basic. For customers who need a complete, deployed and ready-to-use solution designed to solve a specific type of problem, StatSoft provides deployment, on-site training and programming services.
Data Miner Data Miner

The data mining solutions provided by STATISTICA Data Miner are driven by powerful procedures from five general data mining "techniques":

  • General Slicer/Dicer and Drill-down Explorer
  • General Classifier
  • General Modeler/Multivariate Explorer
  • General Forecaster
  • General Neural Networks Explorer
In addition to all of the general statistical and graphical options available in STATISTICA, STATISTICA Data Miner features a number of highly specialised Data Mining Modules, including:

  • Feature Selection and Variable Filtering (for very large data sets)
  • Mining for Association Rules
  • Interactive Drill-Down Explorer
  • Generalized Additive Models (GAM)
  • General Classification and Regression Trees (GTrees)
  • General CHAID (Chi-square Automatic Interaction Detection) Models
  • Interactive Trees (C& RT, CHAID)
  • Boosted Tree Classifiers and Regression
  • Multivariate Adaptive Regression Splines (MAR Splines)
  • Goodness of Fit Computations
  • Support Vector Machines
  • Naive Bayes Classifiers
  • K-Nearest Neighbour




STATISTICA Text Miner

STATISTICA Text Miner is an optional extension of STATISTICA Data Miner, ideal for translating unstructured text data into meaningful, valuable clusters of decision-making "gold". As most users familiar with data mining already know, real-world data comes in a variety of forms, not always organized or easily ready to analyze. STATISTICA Text Miner digs for the underlying information not readily apparent in traditional structured data.

STATISTICA Text Miner was specifically designed as a general and open-architecture tool for mining unstructured information. The feature extraction/selection and other analytic tools available in STATISTICA Text Miner are not only applicable to text documents or Web pages, but can also be used to index, classify, cluster, or otherwise include in your analyses unstructured information such as (pre-processed) bitmaps, sound files, etc.


How can I use STATISTICA Text Miner?
  • Analyze the contents of Web pages. For example, users can automatically process and summarize all Web pages of particular companies, message boards, etc.
  • Include unstructured notes in predictive data mining projects. For example, users may include responses to open-ended interview questions, patients' own descriptions of medical symptoms, etc. in data mining projects involving the clustering of patients and symptoms.
  • Analyze large document repositories. For example, users may analyze repositories of documents such as narratives of insurance claims, etc., to include such information in fraud detection projects.