| By David Smith | Article Rating: |
|
| September 19, 2012 11:00 AM EDT | Reads: |
1,752 |
This guest post is by Alex Guazzelli, VP of Analytics at Zementis Inc. -- ed.
PMML, the
Predictive Model Markup Language, is the de facto standard to represent predictive
analytics and data mining models. With PMML, it is extremely easy to move a
predictive solution from one system to another, since it avoids proprietary
issues and incompatibilities.
Companies around the globe are benefiting from PMML to
make instant use of their predictive solutions. With PMML, there is no
need for custom coding: you can easily move
your solution from the scientist’s desktop, where it was built, to the production
environment, where it is operationally deployed. Companies
also use PMML as the common language between service providers and external vendors.
In this way, it defines a single and clear process for the exchange of
predictive solutions. It becomes the bridge not only between data analysis,
model building, and deployment systems, but also between all the people and
teams involved in the analytical process. This is extremely important, since PMML
is used to disseminate knowledge and best practices, and to ensure
transparency.
All the top analytical tools, commercial and open-source,
support PMML. And, the language itself has reached a great level of maturity
and refinement. PMML 4.1, its latest version, makes it extremely easy for
predictive solutions to be represented in an open and standard way. With PMML, you
can represent a myriad of pre- and post-processing steps, besides the
predictive modeling techniques per se. PMML 4.1 allows for multiple models
(model composition, chaining, segmentation, and ensemble, which includes random
forest models), to be represented by a single and concise language element. It
also allows for model outputs to be transformed into business decisions. Therefore,
a PMML file is able to represent the entire solution, from raw data to business
decision, with one or multiple predictive models.
The availability
of a standard such as PMML combined with scoring solutions in the cloud, for
Hadoop, and in-database make it possible for predictive analytics to fulfill
its promise and crack the big data code. Zementis, Inc. has been in the
forefront of PMML-based scoring, first through its ADAPA Scoring Engine, which
is available for on-site deployment or as a service on cloud (Amazon and IBM),
and lately through its Universal PMML Plug-in which is offered for a range of
databases and for Hadoop. Zementis has partnered with Revolution Analytics, so
that predictive solutions built in R can benefit from the vast scoring infrastructure
already in place. I am proud to be associated with Zementis and excited to be
part of an ever-growing PMML community.
A PMML package for R that exports all kinds of predictive
models is available directly from CRAN.
Traditionally, the PMML Package offered support for the
following data mining algorithms:
ksvm (kernlab):
Support Vector Machines
nnet: Neural
Networks
rpart: C&RT Decision
Trees
lm & glm
(stats): Linear and Binary Logistic Regression Models
arules: Association
Rules
kmeans and hclust:
Clustering Models
Recently, it has been expanded to support:
multinom (nnet):
Multinomial Logistic Regression Models;
glm (stats):
Generalized Linear Models for classification and regression with a wide variety
of link functions
randomForest:
Random Forest Models for classification and regression (click HERE for examples);
rsf
(randomSurvivalForest): Random Survival Forest Models;
And,
this expansion is still on-going as the R community implements support for
other packages and techniques. For more on the PMML package, please take a look
at the paper we published with Graham Williams from Togaware in “The R Journal”.
For that just follow the link below:
PMML: An Open
Standard for Sharing Models
There may be quite a few reasons for you to move your
predictive solution from R to an independent deployment platform. Among them,
you may want parallel execution on big data or real-time scoring for
applications such as fraud detection or recommender systems. With PMML you can
easily move your model to the cloud or inside the database for scoring. Or,
even have it executed on Hadoop. It is really up to you! On top of that, PMML
allows for side-by-side deployment of predictive assets from R as well as other
commercial data mining tools, supporting a multi-vendor environment as well as
platform independent deployment.
More and more companies and individuals are using the PMML
standard for the obvious benefits it provides, putting their predictive
solutions on the fast track. With PMML, the speed of predictive solutions can
be on par with the speed of business.
Dr. Alex Guazzelli is the VP of Analytics
at Zementis Inc. where he is responsible for developing core technology and
predictive solutions under ADAPA, a PMML-based decisioning platform. With more
than 20 years of experience in predictive analytics, Dr. Guazzelli holds a PhD
in Computer Science from the University of Southern California and has co-authored
the book PMML
in Action: Unleashing the Power of Open Standards for Data Mining and
Predictive Analytics, now in its second edition (paperback and
kindle). You can follow him at @DrAlexGuazzelli. Read the original blog entry...
Published September 19, 2012 Reads 1,752
Copyright © 2012 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By David Smith
David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.< David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid
- Cloud People: A Who's Who of Cloud Computing
- Cloud Expo New York: Cloud Is Changing the Economics of Business
- New Relic Q1 2013 Blazes Past Growth Targets and Reaches 40,000 Active Customer Accounts
- How Can Green Web Hosting Benefit Your Business?
- Big Data Isn’t About the Database, It’s About the Application
- Cloud Expo New York: Rethink IT and Reinvent Business with IBM SmartCloud
- Cloud Expo New York: API Security, Does My Business Need an OAuth Server?
- Cloud Expo New York: Developing the World’s First IaaS Marketplace
- Cloud Expo NY: Best Practices for Delivering Oracle Database as a Service
- UNIT4 Business Software: Three Retail Accounting Tips to Help Retailers Leverage the Cloud and Back Office Systems
- Cloud Expo NY: Best Practices for Architecting Your Cloud Infrastructure
- Cloud Expo New York: Aligning Your Cloud Security with the Business
- Cloud People: A Who's Who of Cloud Computing
- Cloud Expo New York: Cloud Is Changing the Economics of Business
- AMD and Adobe Collaborate on Upcoming Version of Adobe Premiere Pro Software to Enable Breakthrough Video Editing Performance Through Open Standards
- Enterasys Spotlights SDN's Impact on Traditional Networking in Upcoming Webinar
- New Relic Q1 2013 Blazes Past Growth Targets and Reaches 40,000 Active Customer Accounts
- State and Local Governments Adopt Microsoft Dynamics CRM to Improve Citizen Service Delivery
- How Can Green Web Hosting Benefit Your Business?
- Cloud Expo New York: Deploying Hybrid Cloud for Performance and Uptime
- Big Data Isn’t About the Database, It’s About the Application
- Cloud Expo New York: Delivering Digital Marketing on the Cloud
- Cloud Expo New York: Rethink IT and Reinvent Business with IBM SmartCloud
- Gravitant Supports General Dynamics Information Technology in Offering New Cloud Brokerage Services to Government Entities
- The Top 150 Players in Cloud Computing
- Six Benefits of Cloud Computing
- Where Are RIA Technologies Headed in 2008?
- FullArmor GPAnywhere Secures Microsoft Application Virtualization Applications Through Group Policy
- SYS-CON's Virtualization Conference & Expo: Themes & Topics
- SYS-CON's Virtualization Journal Opens Its "Readers' Choice Awards" Nominations
- "Virtualization Is Now a Key Strategic Theme," Says Citrix CTO
- Application Virtualization: Instant Migration to Vista, Fast Delivery, Secure Access, Side-by-Side Deployments
- Application Virtualization
- Integration with Windows Vista, Microsoft Excel, and Microsoft Application Virtualization
- The Top 250 Players in the Cloud Computing Ecosystem
- What's the Difference Between Cloud Computing and SaaS?






















