data methodology

In a world driven by data and more and more projects around its analysis with the aim of supporting company decision-making, the design of the phases of a data analytics project has become a fundamental element for the success of the study.

Creating a solid methodology that guides professionals through the different stages of analysis is crucial to ultimately obtain accurate and meaningful results that can be leveraged to benefit the business and the strategic direction it must take to address the future.

In this article, we explore how to design the phases of a data analytics project and the key steps of the process for any type of context, such as insurance brokers, who are increasingly concerned about the exploitation of data and the use of technologies for this.

An appropriate methodology

A well-structured methodology can make all the difference in the data analysis process for insurance brokers and any other data-driven industry. Although there are different methodologies, in this case, we will look at CRISP-DM: CRoss-Industry Standard Process for Data Mining o Inter-Industry Standard Process for Data Mining, widely used by data science professionals.

This methodology consists of 6 phases: Understand the business, understand the data, prepare it, create the model, evaluate it and deploy it.

Phase 1: Understand the business

Any project begins with understanding the business model in which it will be developed. To achieve this, you must first:

  1. Defining the objectives by of business and thoroughly understand what you want to achieve.
  2. Evaluate the situation determining available resources, project requirements, evaluating risks and contingencies, and performing a cost-benefit analysis.
  3. Set the goals Technical of data and how success will be measured from a technical perspective.
  4. Create a plan project, selecting the technologies and tools to use and defining detailed plans for each phase of the project.

Phase 2: Understanding the data

This phase focuses on identifying, collecting and analyzing data sets to obtain project results. The four key tasks are:

  1. Collect initial data and if necessary, load them into tools for analysis.
  2. Examine the data and document its initial properties, such as format, number of records, and key fields.
  3. Dig deeper into the data, consulting them, visualizing them and identifying relationships between them.
  4. check quality of the data, evaluating its cleanliness or quality and documenting any problems found.

Phase 3: Prepare the data

80% of the development time for a data project is preparing the final data sets for modeling. It has five key tasks:

  1. Determine which sets of data will be used and document the reasons for inclusion or exclusion.
  2. Correct, impute or delete erroneous values ​​to keep the data as clean as possible.
  3. Build data calculating new attributes that will be useful.
  4. Integrate and combine data from multiple sources to create new data sets.
  5. Format to the data as needed.
data methodology

Phase 4: Create the model

It is usually the shortest phase of the project, although it is likely that several models based on several different techniques will be built and evaluated in order to stay with the one that obtains the best results. This phase consists of four tasks:

  1. Select modeling techniques determining which algorithms will be applied.
  2. Generate test design and specific data sets for training, testing and validation.
  3. Build model to execute a function that generates a linear regression.
  4. Evaluate the model which provides better results based on domain knowledge, predefined success criteria and test design.

Phase 5: Evaluate the model

The evaluation phase looks more broadly at which technical model best suits the business and what to do next. This phase has three tasks:

  1. evaluate results wondering if the models meet the success criteria and which one(s) we should approve for the business.
  2. review work performed by analyzing whether something was overlooked and/or all steps were executed correctly.
  3. Determine next steps based on the three previous tasks: continue with the implementation, iterate further or start new projects.

Phase 6: Deploy the model

The usefulness of a model lies in its ability to provide access to its results. The complexity of this stage can vary significantly depending on the extent of the implementation and is made up of four tasks:

  1. Develop and document a plan to implement the model.
  2. Monitoring plan, adjustment and optimization to avoid problems during the operational phase of a model.
  3. Documentation and summary of the project in a final report of the project results.
  4. project review and hindsight on what went well, what could have been better, and how to improve in the future.

The design of the phases in a data analysis project is an essential part when we want to carry out a study that facilitates decision making. Relying on a solid methodology like CRISP-DM is undoubtedly important, although it is not the only one. They serve as a guide to carry out a clean and seamless process that allows us to have accurate information that we can trust.

Insurance brokers work every day with a large amount of usable information that, through an appropriate methodology and professionals who help in the design of the phases of a data analysis project, combined with the use of specialized technological tools, will ultimately become information that adds value on the path to informed decision making and continuous business improvement.


Would you like to receive monthly articles as interesting as this one?
Subscribe to our Newsletter HERE

Share