How to run Segmentation · Crowdmines.ai

Segmentation is the most multi-step tool on the platform: it has three explicit human-in-the-loop checkpoints — Marketing-Truth review, method selection, weighting setup — before clustering runs, and a solution-selection step before the report is produced. Here is what the user experiences in the chat.

1
Ask the question
The user types something like:
- "Run a segmentation on our customer base."
- "Find customer groups in this study."
- "Build personas from these survey responses."
The platform recognises the intent and selects Segmentation.
2
State the research question
The chat asks the user to articulate the business question driving the segmentation. It is threaded through the persona narratives, the Profile-Target section, and the executive summary of the final report:
- "Which customer groups exist in our portfolio and what are their defining needs?"
3
Map the variables
The chat asks the user to identify three groups of columns:
- Segmentation variables — the attitudinal, behavioural or needs-based items that define the segments. Typically 5 to 20 Likert items.
- Profiling variables — demographics, usage, brand awareness, NPS — used to describe the segments once they are formed. Never used to build them.
- Additional variables (optional, up to 15) — extra columns that aren't part of clustering or core profiling but should appear in the terminal cross-tab and the report's Market Insights section.
The platform suggests mappings based on column names, types and SPSS metadata. The user confirms or adjusts.
4
Marketing-Truth review
Before clustering runs, the chat surfaces any segmentation variable where 60% or more of respondents already sit in the top-2-box or bottom-2-box — variables where everyone already agrees (or already disagrees) and so cannot discriminate between segments.

The user reviews the flagged variables with their box scores and decides which to keep and which to drop. This is the first human-in-the-loop checkpoint.
5
Method-selection review
The orchestrator scores each clustering method against the data profile, paired with a dimensionality-reduction option, and presents a ranked list with reasons:
- "K-Means + URF — score 78. Numeric segmentation block, n large, p moderate."
- "KAMILA + none — score 72. Mixed data, sample healthy, no scaling needed."
The user can accept the top recommendation, pick a different one, or request a specific list of methods. This is the second human-in-the-loop checkpoint.
6
Weighting setup
If the client has population targets (e.g. age × region census distributions), they are collected here for RIM weighting. Weighting applies to the cross-tabs and the report — not to the clustering itself, which runs on the unweighted respondent set. This is the third human-in-the-loop checkpoint; users with no weighting targets can skip it.
7
Cluster-count sweep
Behind the scenes, the orchestrator dispatches every cluster count from 2 to 10 (or a user-provided list) to every selected method, in parallel. A typical run produces 30 to 60 candidate solutions in two to five minutes.
8
Choose a solution
The chat returns the top three solutions after pruning, each with:
- A composite score (0–5)
- Silhouette, Calinski-Harabasz and Davies-Bouldin metrics
- The number of segments and their sizes
- The method and cluster count that produced it
- A short description of the dominant attitudinal traits in each segment
The user picks one — typically trading off size balance, segment count and interpretability.
9
Review results
Once a solution is chosen, the chat returns:
- A multi-section deep-research report — Overview, Sample, Segment Building, Category Deep-Dives, Profile-Target, Market Insights, Next Steps and Appendix, with an LLM-generated Executive Summary stitched in at the top
- A per-solution cross-tab workbook for every surviving solution the orchestrator considered
- A comprehensive cross-tab workbook for the chosen solution, including the additional variables
- Persona narratives embedded in the report, one per segment
The cross-tabs come pre-formatted with a hyperlinked table of contents, significance letters on every cell, and box-score rows for Likert variables.
10
Follow up
The user can refine or branch into related tools in the same conversation:
- "Re-run with only the top-box-flagged variable dropped."
- "What does the 4-segment solution look like instead?"
- "Build a typing tool from this segmentation."
No re-uploading — the chat remembers the dataset and prior analysis.

How to run a Segmentation analysis

Ask the question

State the research question

Map the variables

Marketing-Truth review

Method-selection review

Weighting setup

Cluster-count sweep

Choose a solution

Review results

Follow up