Insightful Miner 3
What is data mining--and, more important, why should you
(as a quality professional) care?
Data mining was born in marketing, where it was used to
ferret out unsuspected linkages between variables in huge
data sets generated by computerized cash registers in retail
shopping. The classic example is the discovery that people
who buy diapers also like to buy ham--so put them close
together and you can increase sales of both. In science,
it has revealed new insights into the ways in which industrial
activity in Eurasia affects the spread of Nile Fever in
the United States. It can bring to light patterns of influence
on production quality that you would never have dreamed
possible.
Insightful Miner 3 is Insightful Corp.’s dedicated
data mining product that works either alone or hand-in-glove
with its high-end analysis product S-Plus. IM3 has the ubiquitous
drag-and-drop visual programming approach familiar throughout
this market sector and offers packaged export of code to
either S-Plus scripts or fully portable ANSI C routines
for use elsewhere.
The expression language is similar to that in S-Plus,
and where S-Plus 6.1 is also available, there’s a
library extending certain native IM3 functions to their
full S-Plus versions. This library also adds S-Plus graph
nodes.
Featured is an instantly usable explorer model, with each
page holding a library of components (the S-Plus library,
if present, being one of these). Additional libraries can
be created and managed by the user. One point to beware,
though: Although S-Plus will run on any version of Windows
from 98 to NT 4.0 and upward, IM3 insists on XP Professional
(I found it just as usable under XP Home Edition) or NT
4.0 SP6.
IM3 is omnivorous in its acceptance of data input. If
your data set can be imported into almost any mainstream
spreadsheet, database or analytical package, or any other
program for which ODBC drivers are installed, then it’s
accessible to Miner. The import filters have both intelligent
defaults and extensive tuning controls, allowing easy navigation
to specific worksheets or tables within the source. The
more common sources are handled by native drivers, and even
archaic records could be imported via text files. For S-Plus
users, there are also dedicated programming nodes for directly
reading or writing data in S-Plus chapters or transport
files.
Once the data are in, there’s a good set of tools
for preparatory manipulation, cleaning and evaluation. These
include dual-input comparison nodes usually used to compare
outputs such as predicted and actual results but also effective
in trapping transcription or other input errors. Data sets
can be transposed--a useful trick if used with care and
forethought.
In functional terms, nodes come in several classes. In
addition to standard links that pass Cartesian data sets
from one node to another, there is a model transfer type.
Prediction, C-generation and markup language export (hypertext
or the XML predictive model dialect) all sport these new
ports on their output side; principal components, regressions
(Cox, linear and logistic), K-means, naive Bayes, neural
nets and classification or regression trees all have model
ports on the input side as well.
IM3’s prediction node is the de facto centerpiece
of the whole show. The node has twin input ports--standard
port for the data and model port to provide the basis on
which predictions will be made. Models can be copied to
storage inside the prediction node or left dynamic. Results
on my two industrial test bed contracts, testing predictions
against known past outcomes, were impressive.
The worksheet offers a number of useful features, including
user controls on data block memory usage. Components can
be swept up together and represented as a black-box “collection
node,” saving space and improving visual comprehension.
For applications development, there are several optimization
and convenience features. You have an option to add a parameters
table to S-Plus script properties dialogs. Validity checking,
a radio button specifying where and how names, types, etc.,
are to be provided. There’s more, but suffice to say
that the script node is a well-implemented facility that
extends the reach of IM3 models.
All in all, IM3 allows highly flexible exploration of
data in a very approachable way, allowing beginners to achieve
valuable plug-and-go results while experts move rapidly
toward optimized solutions in a larger environment.
Felix Grant is a lecturer and research consultant
in the United Kingdom.
Insightful Miner 3
Requirements: Desktop or server editions
available; 256 MB RAM; 300 MB disk; Windows 2000, 2003 (desktop
and server versions), NT 4.0 or XP. Also supports Microsoft
Terminal Services and Sun Solaris 2.6, 7, 8 and 9.
Price: Insightful Miner Server starts
at $27,000 for four users.
Contact:
Insightful Corp.
1700 Westlake Ave. N., Ste. 500
Seattle, WA 98109
Phone: (206) 283-8802
Fax: (206) 283-6310
E-mail: info@insightful.com
Web: www.insightful.com
|