Segmentation

Typing Tool — turn a segmentation into a working classifier.

You already have segments. The Typing Tool finds the shortest subset of questions that still reliably places a new respondent into the right segment, then ships it as an Excel typing-tool workbook the contact centre can use that afternoon, or a packaged model bundle for teams embedding it in their own systems.

Excel workbook or model bundle9-rung feature ladderCross-validated accuracyCalibration scoring (ECE)Triple-metric screening

Feature-set ladder · financial-services typologyLive result

Accuracy vs. questionnaire length

Six questions hit 84% accuracy — the recommended trade-off

Recommended

FS-6

Accuracy

84%

12-q ceiling

89%

Output

XLSX

The business question it answers

“We invested in a segmentation. How do we keep using it without re-running the full study every time we meet a new customer?”

Say you’ve built a six-segment customer typology from a 50-question study with 4,000 respondents. Your CRM and contact-centre teams want to assign new prospects to those segments in real time — but no one is going to ask a new lead 50 questions. Typing Tool screens the 50, tests reduced sets, and reports: six questions get to 84% classification accuracy; ten questions get to 89%. You pick six, and the chat hands back an Excel workbook the contact centre can use that afternoon.

Two phases inside the same chat

Phase 1 — Evaluate. First you pick the output format — an Excel workbook for teams who score new respondents in a spreadsheet, or a model bundle for engineers embedding the classifier in their own systems. The platform then screens the candidate questions and tests a ladder of every size from 4 to 12 features — nine rungs.

The Excel path tests two interpretable classifiers — logistic regression and LDA (linear discriminant analysis) — for every rung; the model-bundle path tests up to fifteen classifier families and pickles the best.
Every combination is scored on cross-validated accuracy, macro-F1 (a balanced precision/recall measure) and calibration(how well the model’s confidence matches its hit rate, via Brier score and Expected Calibration Error).
The chat returns a ranked ladder and auto-suggests the smallest rung that clears all three quality gates — accuracy ≥ 75%, macro-F1 ≥ 70%, ECE ≤ 1.00 — with logistic preferred over LDA on ties.

Phase 2 — Finalise. You pick a rung — typically a trade-off between length and accuracy. The platform fits the final classifier on the full dataset and exports the deliverable for the path you chose: an Excel typing-tool workbook (plus the feature-set ladder CSV), or a pickled model bundlewith a generated README and loader script. Drop in a new respondent’s answers, get back their predicted segment.

How feature screening works

The screening step reduces a large pool of candidate predictor questions (often 30–50+) down to a shortlist of the most segment-discriminating items. The platform fuses three independent scoring methods:

ANOVA F-statistic— how much each predictor’s mean differs across segments (linear separation).
Mutual Information — non-linear dependency between predictor and segment label.
SHAP importance — game-theoretic contribution from a tree-based classifier (interaction effects).

Bootstrap stability (30 iterations of MI scoring) guards against features that score high by chance. Correlation pruningat r > 0.90 prevents two highly correlated questions from both surviving the shortlist.

Required data

BaselineLabelled segmentation dataset

CandidatesPool of predictor questions

Min CV accuracy75% (default)

Min macro-F10.70 (default)

Min sampleA few hundred labelled respondents

Use when you hear

We have segments — how do we keep classifying new customers cheaply?
We can’t run the full segmentation survey on every lead.
We want the contact centre / sales team / website to know which segment a person belongs to.
We did a segmentation last year — what’s the lightweight version?
How do we make the segmentation operational, not just a deck?

Disqualifier

No existing segmentation?

Start there first — Crowdmines’s Segmentation service builds the original typology; Typing Tool only operationalises one that already exists.

See Segmentation →

Step by step

From 50 questions to a working Excel workbook.

Two phases inside the same chat session — evaluate, then finalise.

Ask the question

“Build a typing tool for our customer segments” — the platform recognises the intent and selects the Typing Tool.

Map the variables

Identify the segment-label column and the pool of candidate predictor questions.

Pick the output format

Excel workbook (formula-native, no Python) or model bundle (a pickled classifier) — this sets the classifier catalog. Optional filters too.

Phase 1 — Evaluate

Triple-metric screening, then a ladder of every size from 4 to 12 features. The Excel path tests logistic + LDA — 18 configurations — cross-validated and ranked.

Pick a rung

The platform auto-suggests the smallest rung clearing all three gates — accuracy ≥ 75%, macro-F1 ≥ 70%, ECE ≤ 1.00.

Phase 2 — Finalise

Final model fit on the full dataset; scenario predictions for typical Low / Mid / High respondent profiles.

Download the deliverables

Excel path: typing-tool workbook + ladder CSV + report. Model-bundle path: pickled model + README + loader script + CSV + report.

Read the full how-to

Compared to

How Crowdmines compares to R / Python / SPSS and agency engagements.

Building a typing tool — a short-form classifier that assigns new respondents to existing segments — is traditionally a specialist task performed by a data scientist or research statistician using R, Python or SPSS’s discriminant analysis module. Some research agencies offer it as a service deliverable, typically scoped at 1–3 weeks and priced as a standalone engagement.

Capability	Traditional (R / Python / SPSS)	Agency service	Crowdmines
Setup effort	Write feature-selection code, set up CV pipelines, tune classifiers, export manually	Brief the agency, send data, wait for delivery	Ask in the chat — variables mapped, screening automated
Feature screening	Manual: analyst picks candidate features using domain knowledge or stepwise selection	Agency uses their preferred method (varies)	Triple-metric screening (ANOVA + Mutual Information + SHAP) with bootstrap stability and correlation pruning
Feature ladder	Analyst tests a few sizes manually, compares accuracy	Agency tests a few configurations	Systematic ladder — every size from 4 to 12 features — × 2 classifiers on the Excel path = 18 configurations tested and ranked automatically
Classifier types	Analyst picks one (usually logistic)	Agency picks one	Excel path: logistic + LDA for every rung. Model-bundle path: up to fifteen classifier families — ridge / lasso / elasticnet logistic, decision tree, random forest, SVC, naive Bayes, Gaussian process, MLP, and XGBoost / LightGBM / CatBoost when installed
Output paths	Single bespoke deliverable per project	Single deliverable	Two paths picked up-front — Excel workbook (formula-native, no Python) or model bundle (joblib pickle + README + loader script)
Evaluation metrics	Analyst computes accuracy; F1 and calibration often skipped	Agency reports accuracy (sometimes)	Accuracy, macro-F1, Brier score and ECE — all cross-validated with standard deviations
Auto-selection	Analyst picks manually	Agency recommends	Platform suggests the best rung with transparent trade-off reasoning
Operational deliverable	Analyst builds a custom Excel workbook or scoring function	Agency delivers an Excel workbook (their format)	Standardised Excel typing-tool workbook + trained model pickle + evaluation CSV — all auto-generated
Turnaround	Days (analyst time)	1–3 weeks	Minutes

Beta Program Open

Six questions, 84% accuracy — an Excel workbook the contact centre can use that afternoon.

Triple-metric feature screening, cross-validated accuracy & calibration, transparent accuracy-vs-length trade-off. Ship the operational artefact in minutes, not weeks.