Join Starburst on May 28th for Launch Point, our new product summit showcasing the future of Starburst.

The difference between SEMMA and CRISP-DM

Strategy
  • Cindy Ng

    Cindy Ng

    Sr. Manager, Content

    Starburst

Share

Linkedin iconFacebook iconTwitter icon

SEMMA means: Sample, explore, modify, model, assess.

CRISP-DM means: Cross-industry standard process for data mining.

SEMMA and CRISP-DM are both process models used in the field of data mining and machine learning to guide the steps involved in developing predictive models and extracting useful insights from data. 

While they share some similarities, they also have distinct differences. Below is a comparison of SEMMA and CRISP-DM.

Origin and Purpose of SEMMA and CRISP-DM

CRISP-DM: Developed in the late 1990s, CRISP-DM is a comprehensive and widely recognized framework for data mining projects. It was designed to provide a structured approach to guide the entire data mining process, from understanding business objectives to deploying models.

SEMMA: SEMMA was developed by SAS (a software company) as a framework for their data mining software. It focuses primarily on the modeling phase and is more specific to SAS’s software suite. However, it has also been used more broadly in the context of data analysis and modeling.

Six phases with CRISP-DM vs Five phases with SEMMA

CRISP-DM: CRISP-DM defines six distinct phases: 

  1. business understanding, 
  2. data understanding, 
  3. data preparation, 
  4. modeling, 
  5. evaluation, and 
  6. deployment.

CRISP-DM covers the entire data mining project lifecycle, including understanding business goals, data collection and preparation, model building, evaluation, and deployment.

SEMMA: SEMMA outlines five key phases: 

  1. sample, 
  2. explore, 
  3. modify, 
  4. model, and 
  5. assess.

SEMMA focuses primarily on the modeling phase, offering guidance on data sampling, exploration, modification, modeling, and model assessment.

Which is more flexible? SEMMA or CRISP-DM?

CRISP-DM: CRISP-DM is considered a more flexible and comprehensive framework, suitable for a wide range of data mining and machine learning projects.

SEMMA: SEMMA is more specific to SAS software and is often used as a companion to other, more comprehensive methodologies like CRISP-DM.

Which is more practical?

CRISP-DM is widely adopted and has extensive documentation and support from the data mining community. It is generally seen as a practical and effective methodology for data mining projects.

SEMMA, while useful for model-building within the SAS environment, may be less familiar and less widely adopted outside of the SAS user base.

CRISP-DM Lifecycle versus SEMMA Framework

CRISP-DM is a more comprehensive and widely accepted data mining process model that covers the entire project lifecycle. 

SEMMA, on the other hand, is a more specialized framework, primarily focusing on the modeling phase and is closely associated with SAS software. 

The choice between the two depends on the specific needs and tools of a given project, with CRISP-DM as a more general and flexible approach.

Cookie Notice

This site uses cookies for performance, analytics, personalization and advertising purposes. For more information about how we use cookies please see our Cookie Policy.

Manage Consent Preferences

Essential/Strictly Necessary Cookies

Required

These cookies are essential in order to enable you to move around the website and use its features, such as accessing secure areas of the website.

Analytical/Performance Cookies

These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages.

Functional/Preference Cookies

These cookies allow our website to properly function and in particular will allow you to use its more personal features.

Targeting/Advertising Cookies

These cookies are used by third parties to build a profile of your interests and show you relevant adverts on other sites.