Typing Tool — turn a segmentation into a working classifier.
You already have segments. The Typing Tool finds the shortest subset of questions that still reliably places a new respondent into the right segment, then ships it as an Excel typing-tool workbook the contact centre can use that afternoon, or a packaged model bundle for teams embedding it in their own systems.
The business question it answers
“We invested in a segmentation. How do we keep using it without re-running the full study every time we meet a new customer?”
Say you’ve built a six-segment customer typology from a 50-question study with 4,000 respondents. Your CRM and contact-centre teams want to assign new prospects to those segments in real time — but no one is going to ask a new lead 50 questions. Typing Tool screens the 50, tests reduced sets, and reports: six questions get to 84% classification accuracy; ten questions get to 89%. You pick six, and the chat hands back an Excel workbook the contact centre can use that afternoon.
Two phases inside the same chat
Phase 1 — Evaluate. First you pick the output format — an Excel workbook for teams who score new respondents in a spreadsheet, or a model bundle for engineers embedding the classifier in their own systems. The platform then screens the candidate questions and tests a ladder of every size from 4 to 12 features — nine rungs.
- The Excel path tests two interpretable classifiers — logistic regression and LDA (linear discriminant analysis) — for every rung; the model-bundle path tests up to fifteen classifier families and pickles the best.
- Every combination is scored on cross-validated accuracy, macro-F1 (a balanced precision/recall measure) and calibration(how well the model’s confidence matches its hit rate, via Brier score and Expected Calibration Error).
- The chat returns a ranked ladder and auto-suggests the smallest rung that clears all three quality gates — accuracy ≥ 75%, macro-F1 ≥ 70%, ECE ≤ 1.00 — with logistic preferred over LDA on ties.
Phase 2 — Finalise. You pick a rung — typically a trade-off between length and accuracy. The platform fits the final classifier on the full dataset and exports the deliverable for the path you chose: an Excel typing-tool workbook (plus the feature-set ladder CSV), or a pickled model bundlewith a generated README and loader script. Drop in a new respondent’s answers, get back their predicted segment.
How feature screening works
The screening step reduces a large pool of candidate predictor questions (often 30–50+) down to a shortlist of the most segment-discriminating items. The platform fuses three independent scoring methods:
- ANOVA F-statistic— how much each predictor’s mean differs across segments (linear separation).
- Mutual Information — non-linear dependency between predictor and segment label.
- SHAP importance — game-theoretic contribution from a tree-based classifier (interaction effects).
Bootstrap stability (30 iterations of MI scoring) guards against features that score high by chance. Correlation pruningat r > 0.90 prevents two highly correlated questions from both surviving the shortlist.
- We have segments — how do we keep classifying new customers cheaply?
- We can’t run the full segmentation survey on every lead.
- We want the contact centre / sales team / website to know which segment a person belongs to.
- We did a segmentation last year — what’s the lightweight version?
- How do we make the segmentation operational, not just a deck?
No existing segmentation?
Start there first — Crowdmines’s Segmentation service builds the original typology; Typing Tool only operationalises one that already exists.
See Segmentation →From 50 questions to a working Excel workbook.
Two phases inside the same chat session — evaluate, then finalise.
How Crowdmines compares to R / Python / SPSS and agency engagements.
Building a typing tool — a short-form classifier that assigns new respondents to existing segments — is traditionally a specialist task performed by a data scientist or research statistician using R, Python or SPSS’s discriminant analysis module. Some research agencies offer it as a service deliverable, typically scoped at 1–3 weeks and priced as a standalone engagement.
| Capability | Traditional (R / Python / SPSS) | Agency service | Crowdmines |
|---|---|---|---|
| Setup effort | Write feature-selection code, set up CV pipelines, tune classifiers, export manually | Brief the agency, send data, wait for delivery | Ask in the chat — variables mapped, screening automated |
| Feature screening | Manual: analyst picks candidate features using domain knowledge or stepwise selection | Agency uses their preferred method (varies) | Triple-metric screening (ANOVA + Mutual Information + SHAP) with bootstrap stability and correlation pruning |
| Feature ladder | Analyst tests a few sizes manually, compares accuracy | Agency tests a few configurations | Systematic ladder — every size from 4 to 12 features — × 2 classifiers on the Excel path = 18 configurations tested and ranked automatically |
| Classifier types | Analyst picks one (usually logistic) | Agency picks one | Excel path: logistic + LDA for every rung. Model-bundle path: up to fifteen classifier families — ridge / lasso / elasticnet logistic, decision tree, random forest, SVC, naive Bayes, Gaussian process, MLP, and XGBoost / LightGBM / CatBoost when installed |
| Output paths | Single bespoke deliverable per project | Single deliverable | Two paths picked up-front — Excel workbook (formula-native, no Python) or model bundle (joblib pickle + README + loader script) |
| Evaluation metrics | Analyst computes accuracy; F1 and calibration often skipped | Agency reports accuracy (sometimes) | Accuracy, macro-F1, Brier score and ECE — all cross-validated with standard deviations |
| Auto-selection | Analyst picks manually | Agency recommends | Platform suggests the best rung with transparent trade-off reasoning |
| Operational deliverable | Analyst builds a custom Excel workbook or scoring function | Agency delivers an Excel workbook (their format) | Standardised Excel typing-tool workbook + trained model pickle + evaluation CSV — all auto-generated |
| Turnaround | Days (analyst time) | 1–3 weeks | Minutes |
Six questions, 84% accuracy — an Excel workbook the contact centre can use that afternoon.
Triple-metric feature screening, cross-validated accuracy & calibration, transparent accuracy-vs-length trade-off. Ship the operational artefact in minutes, not weeks.