Segmentation — who are our customers really, and how should we organise around them?
Segmentation turns a survey into a small number of distinct, actionable respondent groups — and delivers a profiled, narrated report explaining each one. It is not a single clustering algorithm in a wrapper: an orchestrator profiles the data, fans out across seven methods and a full cluster-count sweep, scores every candidate solution, and hands the one you choose to a deep-research report microservice. The markdown report and the significance-tested Excel workbook are the two artefacts that traditionally take a research agency four to six weeks.
The business question it answers
“Who are our customers really, and how should we organise our marketing, product and CX work around them?”
Say a beverage brand fields a 60-question study with 1,800 respondents. They feed in twelve attitudinal items — need-states, occasion preferences, brand-attitude statements — as segmentation variables, and twenty more — demographics, usage frequency, brand awareness, NPS — as profiling variables. The orchestrator screens the twelve for items where everyone already agrees, runs several clustering methods in parallel across k = 2..10, prunes the dominant-cluster solutions, and returns the top three. The user picks one — six segments, balanced sizes, silhouette 0.31, a KAMILA solution — and the chat hands back a multi-section markdown report with named personas, a significance-tested Excel workbook, and an executive summary slotted ahead of section one.
How the methodology works
Segmentation is the most multi-stage tool on the platform. The orchestrator types the variables, decides which clustering methods suit the data, runs them across a sweep of cluster counts in parallel, scores and prunes the resulting solutions, builds significance-tested cross-tab workbooks, and hands the chosen solution to a deep-research report microservice.
- Seven methods, auto-selected. Six clustering algorithms — K-Means, Hierarchical, LCA, K-Prototypes, VarSelLCM and KAMILA — plus URF importance filtering. The orchestrator scores each method against the data profile and pairs it with a dimension-reduction option: none, PCA or URF.
- Marketing-Truth pre-flight.Any segmentation variable where ≥ 60% of respondents already agree (or already disagree) is flagged before clustering — a typology built on a variable everyone agrees with is dead on arrival.
- A full k-sweep, not a guess. Every k from 2 to 10 runs against every chosen method in parallel — a typical run produces 30–60 candidate solutions, instead of eyeballing a single silhouette plot.
- Composite scoring and pruning.Each solution is scored on silhouette, Calinski-Harabasz and Davies-Bouldin, fused into a 0–5 score. Any solution where one segment holds ≥ 70% of the sample is discarded; the top three are returned for you to compare.
- Significance-tested cross-tabs.Pairwise z-tests on every cell, ANOVA and pairwise Welch’s t-tests on numeric profiles, and automatic top/middle/bottom-box rows for Likert — in a “Corporate Blue” Excel workbook with a hyperlinked table of contents.
- Evidence-bound persona narratives. The report is LLM-generated, but an over- or under-index is only called out when the index passes 130 / 70 and the cell is statistically significant — no agency-deck inflation.
What you see in the chat
The flow is multi-step with three explicit human-in-the-loop checkpoints — a Marketing-Truth review, a method-selection review, and weighting setup — before clustering runs, then a solution-selection step before the report is produced. You stay in control at every decision the analysis makes.
The deliverables are a multi-section deep-research report (Overview, Sample, Segment Building, Category Deep-Dives, Profile-Target, Market Insights, Next Steps and Appendix, with an executive summary stitched in at the top), a per-solution cross-tab workbook for every surviving solution, a comprehensive cross-tab workbook for the chosen solution, and persona narratives embedded one per segment. The report and the workbook are the two artefacts to call out — they are what traditionally takes a research agency four to six weeks.
- We need to refresh our customer typology — the last one is five years old.
- Marketing is targeting one big audience. We need to break it into groups.
- Product wants to know which features matter to which kinds of user.
- We have attitudinal data sitting in a tracker and no one's used it for segmentation.
- Our agency quoted six weeks for a segmentation. Can you do this in days?
Only want demographic groups?
Segmentation builds attitudinal and behavioural typologies. If you only need to split the sample by age, region or channel, that’s a descriptive job — start with Data Exploration.
See Data Exploration →From survey upload to narrated typology.
Segmentation is the most multi-step tool on the platform — three human-in-the-loop checkpoints sit between your data and the clustering run, so you steer every decision.
How Crowdmines compares to SPSS, specialist tools and consulting agencies.
Building a strategic customer segmentation has historically been one of the slowest and most expensive deliverables in market research — manual statistical work in SPSS, R or Mplus; point-and-click SaaS tools; or a four-to-eight-week agency engagement costed at $30K–$150K.
| Capability | Traditional (SPSS / R / Mplus) | SaaS tools (Displayr, Q, Sawtooth) | Agency / consulting | Crowdmines |
|---|---|---|---|---|
| Setup effort | Write clustering scripts, pre-process by hand, hand-pick variables | Point-and-click, but still requires manual variable typing and method choice | Brief the agency, send data, wait | Ask in the chat — variable typing, method choice and the k-sweep all automated |
| Clustering methods | One per run; analyst picks which | Usually one or two | Agency picks one (often K-Means or LCA) | Seven methods, auto-selected, plus URF and PCA dimension reduction |
| Mixed-type data | Manual encoding, often poorly handled | Typically forces a single variable type | Some agencies handle it; many don't | Native — Gower distance, K-Prototypes, KAMILA and VarSelLCM |
| k selection & validation | Analyst eyeballs a scree or silhouette plot; codes metrics by hand | Some auto-selection; silhouette and a few extras | Agency picks k and reports metrics in the deck | Full k = 2–10 sweep; every solution scored on silhouette, Calinski-Harabasz and Davies-Bouldin, fused into a 0–5 score |
| Profiling & sig-testing | Manual cross-tabs and chi-square / ANOVA in SPSS or Excel | Built-in cross-tabs; significance testing varies | Standard agency deliverable | Pairwise z-tests on every cell, ANOVA and Welch's t-tests, box-score rows |
| Deep-research report | Analyst hand-writes — days of work | Limited templated output | Forty-page deck delivered weeks later | Eight-section markdown report with an executive summary, auto-generated |
| Turnaround | Days to weeks | Hours to days | 4–8 weeks | Minutes to cluster, minutes to report |
| Cost per analysis | Analyst time (days to weeks) | Software licence + analyst time | $30K–$150K per engagement | Platform subscription — unlimited runs |
Refresh your typology in days, not six weeks.
Built for insights, brand and marketing teams refreshing or building a strategic segmentation. Seven methods, a full k-sweep, significance-tested cross-tabs and a narrated deep-research report — all inside a chat.