Skip to content

render-examples/data-processor-workflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Customer Data Merge - Render Workflows Demo

Merge customer data from multiple sources into enriched profiles using parallel Render Workflows.

This demo showcases:

  • Parallel processing: 10 shards processed simultaneously
  • Multi-source merge: CRM + Billing + Product + Support → Enriched profiles
  • High throughput: 400K records processed in seconds
  • Both Python and TypeScript: Identical implementations in both languages

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        FRONTEND (Next.js)                           │
│                     UI - Trigger & Monitor                          │
└─────────────────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┴───────────────┐
              ▼                               ▼
┌─────────────────────────┐     ┌─────────────────────────┐
│    Python API           │     │    TypeScript API       │
│    (FastAPI)            │     │    (Fastify)            │
└─────────────────────────┘     └─────────────────────────┘
              │                               │
              ▼                               ▼
┌─────────────────────────┐     ┌─────────────────────────┐
│   Python Workflow       │     │   TypeScript Workflow   │
│   (render_sdk)          │     │   (@renderinc/sdk)      │
└─────────────────────────┘     └─────────────────────────┘
              │                               │
              └───────────────┬───────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         SAMPLE DATA                                  │
│   crm.csv │ billing.csv │ product.csv │ support.csv (100K each)     │
└─────────────────────────────────────────────────────────────────────┘

Workflow: Shard-Based Parallel Processing

The workflow uses hash-based sharding to ensure deterministic routing:

  1. Load: Read all 4 CSV source files
  2. Route: Hash each customer_id to assign records to 10 shards
  3. Process: Spawn 10 parallel subtasks (one per shard)
  4. Merge: Each shard merges its customers' data from all sources
  5. Enrich: Calculate health_score, churn_risk, expansion_potential
  6. Aggregate: Combine all shard results into final output
customer_id → hash(customer_id) % 10 → shard_id

Same customer always routes to the same shard across all files.

Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 20+
  • Render CLI 2.11.0+ (brew install render on macOS)
  • Render account with Workflows access

Local Development

  1. Generate sample data:

    cd scripts
    python generate_data.py --rows 1000  # Small dataset for testing
    # python generate_data.py --rows 100000  # Full 100K dataset
  2. Start the local workflow server (pick Python or TypeScript):

    The Render CLI runs a local task server on port 8120:

    Python:

    cd python/workflows
    pip install -r requirements.txt
    render workflows dev -- python main.py

    TypeScript:

    cd typescript/workflows
    npm install
    render workflows dev -- npx tsx src/main.ts

    Verify tasks registered:

    render workflows list --local
  3. Start the matching API (pick one):

    Set RENDER_USE_LOCAL_DEV=true so the API triggers the local workflow server instead of Render's API:

    Python (default, runs on http://localhost:8001):

    cd python/api
    pip install -r requirements.txt
    RENDER_USE_LOCAL_DEV=true python main.py

    TypeScript (runs on http://localhost:8002):

    cd typescript/api
    npm install
    RENDER_USE_LOCAL_DEV=true npm run dev

    If using the TypeScript API, also set NEXT_PUBLIC_API_URL=http://localhost:8002 before starting the frontend.

  4. Start the frontend:

    cd frontend
    npm install
    npm run dev
    # Runs on http://localhost:3000
  5. Open http://localhost:3000 and click Run Workflow.

Environment Variables

Variable Default Used by
RENDER_API_KEY (required for deployed services) API services
RENDER_USE_LOCAL_DEV false API services (set true for local dev)
WORKFLOW_SLUG data-processor-workflows-py / data-processor-workflows-ts API services
DATA_DIR ../../sample_data Workflow services
NEXT_PUBLIC_API_URL http://localhost:8001 Frontend

Deploy to Render

1. Deploy Frontend and API (Blueprint)

The Blueprint (render.yaml) deploys the frontend and the Python API by default. If you prefer TypeScript, edit render.yaml to uncomment the TypeScript API and comment out the Python one (see the instructions in the file).

Deploy to Render

Or manually:

  1. Push this repo to GitHub/GitLab
  2. In Render Dashboard: NewBlueprint
  3. Connect your repo and deploy

2. Create Workflows (Manual)

Workflows are not yet supported in Blueprints. Create them manually:

Python Workflow

  1. In Render Dashboard: NewWorkflow
  2. Connect your repo
  3. Settings:
    • Name: data-processor-workflows-py
    • Root Directory: python/workflows
    • Build Command: pip install -r requirements.txt
    • Start Command: python main.py
  4. Deploy

TypeScript Workflow

  1. In Render Dashboard: NewWorkflow
  2. Connect your repo
  3. Settings:
    • Name: data-processor-workflows-ts
    • Root Directory: typescript/workflows
    • Build Command: npm install && npm run build
    • Start Command: npm start
  4. Deploy

3. Configure Environment Variables

On each API service, set:

  • RENDER_API_KEY: Your Render API key (create at Dashboard → Account → API Keys)
  • WORKFLOW_SLUG: The workflow service name (the API appends /merge_customer_data automatically), e.g.:
    • Python: data-processor-workflows-py
    • TypeScript: data-processor-workflows-ts

Project Structure

/
├── frontend/                    # Next.js brutalist UI
│   ├── app/
│   │   ├── page.tsx             # Main demo page
│   │   └── how-it-works/
│   │       └── page.tsx         # Workflow visualizer
│   ├── components/
│   │   ├── WorkflowTrigger.tsx  # Run button
│   │   ├── EventLog.tsx         # Terminal-style log
│   │   ├── DataPreview.tsx      # Before/after view
│   │   └── ResultsSummary.tsx   # Stats and shard timings
│   └── lib/
│       ├── api.ts               # API client
│       └── workflow-config.ts   # Visualizer config
│
├── python/
│   ├── api/                     # FastAPI service
│   │   └── main.py              # Trigger endpoints
│   └── workflows/               # Render Workflow
│       ├── main.py              # Task definitions
│       ├── sharding.py          # Hash-based routing
│       └── enrichment.py        # Score calculations
│
├── typescript/
│   ├── api/                     # Fastify service
│   │   └── src/index.ts         # Trigger endpoints
│   └── workflows/               # Render Workflow
│       └── src/
│           ├── main.ts          # Task definitions
│           ├── sharding.ts      # Hash-based routing
│           └── enrichment.ts    # Score calculations
│
├── sample_data/                 # Generated CSVs
├── scripts/
│   └── generate_data.py         # Data generator
│
├── render.yaml                  # Blueprint (frontend + APIs)
└── README.md

Sample Data Schema

Input CSVs

crm.csv

customer_id,email,company_name,industry,employee_count,deal_stage,deal_value,sales_owner,last_contact

billing.csv

customer_id,email,plan,mrr,payment_status,subscription_start,last_payment

product.csv

customer_id,email,signup_date,last_active,total_sessions,features_used,usage_pct,account_status

support.csv

customer_id,email,total_tickets,open_tickets,avg_resolution_hrs,last_ticket_date,nps_score,csat_score

Output: Enriched Profile

All fields merged, plus calculated fields:

  • health_score: 0-100 based on usage, payments, NPS, support tickets
  • churn_risk: LOW / MEDIUM / HIGH
  • expansion_potential: LOW / MEDIUM / HIGH

Performance

With 100K rows per source (400K total records):

Metric Value
Total records 400,000
Shards 10
Parallel tasks 10
Estimated time 2-5 seconds
Sequential estimate ~20+ seconds
Speedup ~5-10x

Customization

Change shard count

Edit NUM_SHARDS in:

  • python/workflows/sharding.py
  • typescript/workflows/src/sharding.ts

Modify enrichment logic

Edit the calculation functions in:

  • python/workflows/enrichment.py
  • typescript/workflows/src/enrichment.ts

Generate different data sizes

python scripts/generate_data.py --rows 10000    # 10K rows
python scripts/generate_data.py --rows 1000000  # 1M rows

Troubleshooting

Workflow not found

  • Check WORKFLOW_SLUG matches the workflow service name in the Dashboard
  • Ensure workflow deployed successfully in Dashboard

API key errors

  • Verify RENDER_API_KEY is set correctly
  • API key needs Workflows permissions

CSV not found

  • Check DATA_DIR environment variable
  • Ensure CSVs are accessible from workflow runtime

Learn More

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors