The domain-finder workflow is one of the most important production contracts in this repository. It replaces opaque Pipedream behavior with explicit candidate selection, audit artifacts, and optional writeback execution.
What the pipeline covers
The workflow breaks the old process into auditable stages:
- measure backlog and bottlenecks
- fetch candidate domains from BigQuery
- pick the best domain per company
- audit the decision set
- compare against the legacy selector when needed
- generate writeback SQL
- optionally execute writeback or repair SQL
Backlog analysis
Start by measuring whether the pipeline is starved upstream:
node ./dist/cli.js domainfinder:backlog:bq --market dach --out ./data/domainfinder-backlog.json
This report summarizes the backlog and highlights which upstream stages are the real bottlenecks.
Candidate fetch
node ./dist/cli.js domainfinder:candidates:bq \
--market dach \
--limit 500 \
--out ./data/domain-candidates.json \
--sql-out ./data/domain-candidates.sql
This is the warehouse-backed input to the selection phase.
Selection logic
The improved selector applies this policy in order:
- prefer
linkedinDomainwhen it exists and is not blacklisted - otherwise prefer the root domain derived from
linkedinWebsitewhen it exists and is not blacklisted - otherwise choose the non-blacklisted candidate with the highest Hunter email count
- otherwise fall back to the first non-null non-blacklisted candidate
- if nothing usable remains, mark the company as
no-domain
Blacklisted domains currently include:
linkedin.combit.lylinktr.eefacebook.com
Run the selector with:
node ./dist/cli.js domainfinder:select --in ./data/domain-candidates.json --out ./data/domain-decisions.json
Audit the decisions
node ./dist/cli.js domainfinder:audit --in ./data/domain-decisions.json --out ./data/domain-audit.json
The audit report gives you:
- counts by decision reason
- accepted versus rejected writeback candidates
- risk flags for each decision
Compare against legacy Pipedream behavior
node ./dist/cli.js domainfinder:compare-pipedream --in ./data/domain-candidates.json --out ./data/domain-comparison.json
Use this when you need to quantify how the improved selector differs from the old Hunter-heavy strategy.
Generate improved input SQL
If you need the replacement input view for the old workflow:
node ./dist/cli.js domainfinder:input-sql --market dach --out ./data/domainFinder_input_v2.sql
Writeback behavior
Generate conservative writeback SQL:
node ./dist/cli.js domainfinder:writeback-sql \
--in ./data/domain-decisions.json \
--out ./data/domain-writeback.sql \
--trace-id salesprompter-cli-dach-20260308
Or generate and optionally execute it:
node ./dist/cli.js domainfinder:writeback:bq \
--in ./data/domain-decisions.json \
--out ./data/domain-writeback.sql \
--trace-id salesprompter-cli-dach-20260308 \
--execute
Writeback rules:
- writes target
SalesPrompter.domainFinder_output no-domaindecisions are excluded- blacklisted domains are excluded
trace_idis preserved for provenance
Audit existing warehouse exposure
To measure the current state of warehouse-visible domains:
node ./dist/cli.js domainfinder:audit-existing:bq --market dach --out ./data/domain-existing-audit.json
To compare two audit snapshots:
node ./dist/cli.js domainfinder:audit-delta \
--before ./data/domain-existing-audit-before.json \
--after ./data/domain-existing-audit-after.json \
--out ./data/domain-existing-audit-delta.json
Repair existing chosen domains
node ./dist/cli.js domainfinder:repair-existing:bq \
--market dach \
--mode conservative \
--limit 5000 \
--out ./data/domain-repair.sql \
--trace-id salesprompter-cli-repair-dach
Repair modes:
| Mode | Behavior |
|---|---|
conservative |
fix missing or blacklisted chosen domains |
aggressive |
fix missing, blacklisted, and mismatched domains |
mismatch-only |
fix only mismatched chosen domains |
Run the full pipeline
node ./dist/cli.js domainfinder:run:bq \
--market dach \
--limit 500 \
--out-dir ./data/domainfinder-run \
--trace-id salesprompter-cli-dach-20260308
Artifacts created in --out-dir:
- candidate SQL
- candidate JSON
- decision JSON
- audit JSON
- writeback SQL
Use --execute-writeback only when you want the generated writeback SQL executed immediately.