DeepStream Coding Agent

A project showcasing how to leverage AI coding assistants (Cursor, Claude Code, etc.) for accelerated NVIDIA DeepStream SDK application development using a curated agentic skill and structured prompts.

Disclaimer: Code generated with AI coding assistants is intended as a development starting point. All generated code must undergo your full software development lifecycle (SDLC) — including code review, testing, and security validation — before production use.

Prerequisites

For code generation (using the skill and prompts)

AI coding assistant that supports agentic skills (e.g., Cursor, Claude Code)

No GPU, SDK, or special hardware is required — the skill and example prompts work on any system.

For running the generated code

The following are required on the target execution environment:

NVIDIA DeepStream SDK 9.0 — installed locally or available via NVIDIA NGC container
Python 3.12+ with the pyservicemaker package
NVIDIA GPU with driver version 590 or later
CUDA 13.1 and TensorRT 10.14.1.48
Supported OS: Ubuntu 24.04 (x86_64 or ARM64/Jetson)

The deepstream-import-vision-model skill needs a few extra runtime tools (trtexec, wkhtmltopdf, mediainfo, deepstream-app, an optimum-capable Python venv). They are listed and auto-checked by the pre-flight script in skills/deepstream-import-vision-model/SKILL.md.

For detailed environment setup, refer to the DeepStream SDK Developer Guide.

Project Structure

DeepStream-Coding-Agent/
├── skills/                  # Agentic skills for guided DeepStream development
│   └── deepstream-dev/      # DeepStream development skill with condensed references
│   └── deepstream-import-vision-model/    # Autonomous vision-model onboarding & benchmarking pipeline skill
├── example_prompts/         # Pre-built prompts for code generation
├── LICENSE                  # CC-BY-4.0 AND Apache-2.0
└── README.md                # This file

Purpose

This project provides the tooling and reference material needed to:

Supply domain-specific context to AI coding assistants through a curated agentic skill
Generate production-ready DeepStream code using well-structured prompts
Accelerate development of video analytics pipelines with AI assistance

Agentic Skills

An agentic skill is a structured knowledge package that an AI coding assistant can automatically discover and activate during code generation. It contains domain-specific rules, reference documentation, and guardrails that guide the AI agent to produce accurate, idiomatic code — without the developer needing to manually reference files in every conversation.

The skills/deepstream-dev/ directory contains a DeepStream agentic skill that follows the standard SKILL.md convention supported by AI coding assistants such as Cursor, Claude Code, and others.

This project ships two complementary skills:

Skill	Mode	Use when you want to…
`deepstream-dev`	Reference-rich (you write code, the agent consults docs)	Hand-author or refine a `pyservicemaker` / GStreamer DeepStream pipeline with the agent answering API questions correctly.
`deepstream-import-vision-model`	Autonomous orchestration (the agent runs an end-to-end pipeline)	Take any HuggingFace or NGC object-detection model and produce a TensorRT engine, a DeepStream multi-stream benchmark, and a PDF report — fully unattended.

Skip ahead to Skill: deepstream-import-vision-model for the model-onboarding workflow.

Skill: deepstream-dev

This skill targets NVIDIA DeepStream SDK 9.0 development using the Python pyservicemaker API. When activated, it instructs the AI agent to consult bundled reference documents before generating any code, significantly reducing inaccuracies and ensuring correct API usage.

Bundled reference topics:

Reference	Coverage
`gstreamer_plugins.md`	GStreamer plugin properties
`service_maker_api.md`	Pipeline/Flow API, metadata access, probes
`use_cases_pipelines.md`	Pipeline patterns: playback, multi-inference, cascaded GIE
`kafka_messaging.md`	Kafka/message broker setup and configuration
`best_practices.md`	Design patterns, pitfalls, anti-patterns
`buffer_apis.md`	BufferProvider/Feeder and BufferRetriever/Receiver
`media_extractor_advanced.md`	MediaExtractor, MediaChunk, FrameSampler
`utilities_config.md`	PerfMonitor, EngineFileMonitor, SourceConfig
`nvinfer_config.md`	nvinfer config file format and all parameters
`tracker_config.md`	nvtracker config (NvDCF, IOU, DeepSORT, NvSORT)
`troubleshooting.md`	Error messages and solutions
`rest_api_dynamic.md`	REST API, dynamic source management
`metamux_config.md`	nvdsmetamux config, parallel multi-model inference, metadata merging
`docker_containers.md`	Docker images, Dockerfile examples, pyservicemaker install, container run commands

Installing the Skill

Copy the deepstream-dev skill directory (including its references/ subdirectory) into the skills folder recognized by your AI coding assistant. You can install it at the user level (available across all projects) or at the workspace level (scoped to a single project).

Tool	User-level path	Workspace-level path
Cursor	`~/.cursor/skills/deepstream-dev/`	`<workspace>/.cursor/skills/deepstream-dev/`
Claude Code	`~/.claude/skills/deepstream-dev/`	`<workspace>/.claude/skills/deepstream-dev/`
Codex	`~/.codex/skills/deepstream-dev/`	`<workspace>/.codex/skills/deepstream-dev/`
Other tools	Consult your tool's documentation for the skills directory location

Step 1: Create the Skills Directory

# Example: Cursor user-level
mkdir -p ~/.cursor/skills/

# Example: Claude Code user-level
mkdir -p ~/.claude/skills/

# Example: Codex user-level
mkdir -p ~/.codex/skills/

# Example: workspace-level (replace .cursor with your tool's directory)
mkdir -p <workspace>/.cursor/skills/

Step 2: Copy the Skill

# User-level (replace path with your tool's skills directory)
cp -r skills/deepstream-dev ~/.cursor/skills/

# Example: Claude Code user-level
cp -r skills/deepstream-dev ~/.claude/skills/

# Example: Codex user-level
cp -r skills/deepstream-dev ~/.codex/skills/

# Or workspace-level
cp -r skills/deepstream-dev <workspace>/.cursor/skills/

After copying, the directory structure should look like:

<skills-directory>/
└── deepstream-dev/
    ├── SKILL.md              # Skill definition with rules and quick references
    └── references/           # Condensed reference documents
        ├── best_practices.md
        ├── buffer_apis.md
        ├── gstreamer_plugins.md
        ├── kafka_messaging.md
        ├── media_extractor_advanced.md
        ├── nvinfer_config.md
        ├── rest_api_dynamic.md
        ├── service_maker_api.md
        ├── tracker_config.md
        ├── troubleshooting.md
        ├── use_cases_pipelines.md
        ├── utilities_config.md
        ├── metamux_config.md
        └── docker_containers.md

Step 3: Verify the Installation

Open (or restart) your AI coding assistant.
Open the agent / chat panel.

Ask a DeepStream-related question, for example:

Create a DeepStream pipeline that reads a video file and runs object detection using ResNet18.

The agent should automatically activate the deepstream-dev skill and consult its reference documents before generating code.

Tip: The skill is most effective in Agent mode. In agent mode, the AI assistant automatically selects and activates relevant skills based on the task context — no manual file referencing needed.

Skill: deepstream-import-vision-model

deepstream-import-vision-model is an autonomous skill: instead of helping you write code, it executes a complete model bring-up pipeline and hands you back a benchmarked TensorRT engine plus a publication-ready PDF report.

Pipeline (runs unattended):

Model Acquire — parses a HuggingFace or NVIDIA NGC URL, downloads ONNX (or exports SafeTensors → ONNX via optimum-cli), extracts labels.
Engine Build — builds a dynamic TensorRT engine via trtexec, with iterative batch-size scaling and warm-cache reuse.
DeepStream Pipeline — generates a custom nvinfer bbox parser, builds the .so, runs single-stream KITTI validation, then a multi-stream sweep.
Report — produces 5 benchmark charts and a Markdown / HTML / PDF report under models/<model_name>/reports/.

Supported model scope: Object detection models only. Classification, segmentation, pose estimation, and other vision tasks are rejected up front (architecture detected from config.json — downloaded for HuggingFace models, extracted from the archive for NGC models).

The following detection architecture families are supported:

Transformer-based detectors — query-based encoder-decoder designs (e.g., DETR and RT-DETR family, including TAO Transformer variants from NGC)
One-stage grid-based detectors — single-pass, anchor-based or anchor-free designs that predict boxes directly from spatial feature grids (e.g., YOLO family)
Open-vocabulary / zero-shot detectors — vision-language models that localize objects described by free-form text queries at inference time (e.g., GroundingDINO / OWL-ViT)

Models that fall outside these families are untested; custom bbox parsers may need manual adjustment for novel output tensor layouts.

Bundled references

Reference	Coverage
`model-acquire.md`	HF / NGC URL parsing, ONNX vs SafeTensors detection, optimum export, label extraction
`engine-build.md`	`trtexec` flags, dynamic shapes, batch-size scaling, timing-cache reuse, PEAK_GPU_STREAMS derivation
`pipeline-run.md`	Custom nvinfer bbox parser (with the mandatory `obj = {}` zero-init), single-stream KITTI validation, multi-stream sweep
`report-generation.md`	`benchmark_data.json` schema, 5-chart generation, 12-section Markdown report, HTML + PDF render via `wkhtmltopdf`

Installing the skill

Same install paths as deepstream-dev:

# Example: Cursor user-level
cp -r skills/deepstream-import-vision-model ~/.cursor/skills/

# Example: Claude Code user-level
cp -r skills/deepstream-import-vision-model ~/.claude/skills/

# Example: Codex user-level
cp -r skills/deepstream-import-vision-model ~/.codex/skills/

# Or workspace-level
cp -r skills/deepstream-import-vision-model <workspace>/.cursor/skills/

After copying:

<skills-directory>/
└── deepstream-import-vision-model/
    ├── SKILL.md            # Top-level skill definition + critical rules
    ├── references/         # 4 phase references (model-acquire, engine-build, pipeline-run, report-generation)
    └── scripts/            # Helpers: model/, engine/, deepstream/, report/

Verifying the installation

Open (or restart) your AI coding assistant on a workspace where you want bring-up artifacts to land (the skill writes to models/<model_name>/ relative to the project root).

Ask:

Use deepstream-import-vision-model to onboard and benchmark this detection model
end-to-end, and produce the PDF benchmark report:
https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/rtdetr_2d_warehouse

Use the default sample video at
/opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4.

In Cursor:

@deepstream-import-vision-model onboard and benchmark this detection model end-to-end,
and produce the PDF benchmark report:
https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/rtdetr_2d_warehouse

Use the default sample video at
/opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4.

For an interactive variant that prompts for inputs with defaults, see example_prompts/import_vision_model_detection_pipeline.md.

The agent should activate deepstream-import-vision-model, run pre-flight checks (nvidia-smi, trtexec, wkhtmltopdf, mediainfo, deepstream-app), and proceed through Steps 1–8 without further prompting.

Output structure

models/<model_name>/
  model/          # ONNX file(s)
  parser/         # custom nvinfer bbox parser (.cpp, .so)
  config/         # nvinfer config, ds-app config, labels.txt
  benchmarks/     # TRT engines + trtexec / DS logs
  reports/        # benchmark_report.md / .html / .pdf + charts/
  samples/        # output .mp4, KITTI detections, test frames

The final PDF (reports/benchmark_report_<model_name>.pdf) being >500 KB is the skill's own success signal that charts were embedded correctly.

Tip: Like deepstream-dev, this skill works best in Agent mode. Manual @-mention is not required after install — the assistant picks it up from the URL pattern in your prompt.

Using Example Prompts

The example_prompts/ directory contains pre-built prompts for generating DeepStream applications. Each prompt file provides a complete specification that an AI agent can follow to produce working code.

Getting started? Begin with video_infer_app.md for a minimal single-stream inference example, then progress to multi_stream_tracker.md for multi-stream and tracking capabilities.

Available Prompts

Prompt File	Purpose
`multi_stream_tracker.md`	Multi-stream RTSP app with tracker and 2x2 tiled display
`rtvi_vlm_core_app.md`	Complete RTSP video processing app with VLM integration
`rtvi_vlm_openapi_spec.md`	FastAPI microservice with OpenAPI specification. Should be used after the core app is generated using @rtvi_vlm_core_app.md
`video_infer_app.md`	Basic video file inference with bounding box display
`video_object_count.md`	Video inference with object detection counting
`video_parallel_infer_app.md`	Parallel multi-model inference with demux stream selection and metadata merging
`yolov26s_detection.md`	YOLOv26s model download, ONNX export, and custom parsing library
`nvdsdynamicsrcbin_app.md`	Use of nvdsdynamicsrcbin plugin
`msgconv_kafka.md`	Video inference with message converter sending detection results to Kafka
`single_view_3d_tracker.md`	The single-view 3D tracking. Given the camera matrix and human model of a static camera, estimates and keeps tracking of object states in the 3D physical world

Step-by-Step Guide: Using Prompts

Step 1: Open the AI Chat / Agent Panel

Open the agent or chat panel in your AI coding assistant. Most tools provide a keyboard shortcut or sidebar icon for this.

Step 2: Reference the Prompt File

Use your tool's file-referencing feature (e.g., @ mentions) to include the prompt file:

@example_prompts/rtvi_vlm_core_app.md

Or simply type @ and start typing the filename to search.

Step 3: Execute the Prompt

Option A: Direct execution

Reference the file in the chat and instruct the agent to follow it:

Follow the instructions in @example_prompts/rtvi_vlm_core_app.md to generate the application.

Option B: Incremental execution

For complex prompts, break them into smaller steps:

Based on @example_prompts/rtvi_vlm_core_app.md, first implement the vLLM backend module.

Then follow up with:

Now implement the frame selection logic as described in the prompt.

Step 4: Review and Iterate

Review the generated code in the diff view.
Accept or reject individual changes.

Ask follow-up questions for refinements:

Can you optimize the GPU memory usage in the generated stream_processor.py?

Example Workflow: Generating the RTVI Application

Here's a complete workflow for generating the RTVI VLM application:

1. Generate Core Application

@example_prompts/rtvi_vlm_core_app.md

Generate the complete application following these instructions.

2. Add FastAPI Microservice

@example_prompts/rtvi_vlm_openapi_spec.md

Create the FastAPI server with all endpoints shown in @rtvi_vlm_openapi_spec.png

Best Practices for AI-Assisted Development

Writing Effective Prompts

Be specific — Include exact requirements, constraints, and expected outputs
Reference context — Use @ mentions to include relevant files and documents
Break down complex tasks — Divide large features into smaller, focused prompts
Include examples — Show expected input/output formats when applicable
Specify the deployment target — Mention whether the application targets dGPU (x86_64) or Jetson (ARM64), as pipeline elements and sink choices may differ

Iterating on Generated Code

Review before accepting — Always inspect generated pipelines for correct element linking and property values
Test incrementally — Run the pipeline after each major change rather than building the entire application at once
Use the troubleshooting reference — If a pipeline fails, ask the agent to consult troubleshooting.md for known error patterns
Provide error output — When debugging, paste the full GStreamer or DeepStream error log into the chat for more accurate fixes

Demo Video

Build Vision AI Pipelines with DeepStream Coding Agents

Additional Resources

Contributing

This project is currently not accepting contributions.

License

This project is licensed under CC-BY-4.0 AND Apache-2.0.

SPDX-License-Identifier: CC-BY-4.0 AND Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
example_prompts		example_prompts
skills		skills
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

DeepStream Coding Agent

Prerequisites

For code generation (using the skill and prompts)

For running the generated code

Project Structure

Purpose

Agentic Skills

Skill: deepstream-dev

Installing the Skill

Step 1: Create the Skills Directory

Step 2: Copy the Skill

Step 3: Verify the Installation

Skill: deepstream-import-vision-model

Bundled references

Installing the skill

Verifying the installation

Output structure

Using Example Prompts

Available Prompts

Step-by-Step Guide: Using Prompts

Step 1: Open the AI Chat / Agent Panel

Step 2: Reference the Prompt File

Step 3: Execute the Prompt

Step 4: Review and Iterate

Example Workflow: Generating the RTVI Application

Best Practices for AI-Assisted Development

Writing Effective Prompts

Iterating on Generated Code

Demo Video

Additional Resources

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages