Using LLM + Home Assistant to Detect Delivery Shippers at the Gate
Problem: Traditional AI cameras like Frigate can only detect a "person" but cannot understand the occupational context – they don't know whether it's a delivery person, a guest, or someone just passing by.
Solution: Combine a Large Language Model (LLM) with Home Assistant to perform deeper image analysis, detect visual signs of delivery workers (shippers), and automatically count them.
Workflow
1. Camera Detection
↓
2. Person Detected?
├─ Yes → AI Analysis
│ ↓
│ 3. Gemini Flash
│ ↓
│ 4. Count Shipper
│ ↓
│ 5. > 0 Shipper?
│ ├─ Yes → Update Counter
│ │ ↓
│ │ 6. Auto Reset (5min)
│ └─ No → Reset (5min)
└─ No → End
Code Implementation
1. Trigger & Conditions
alias: Detect human shipper at the gate
description: ""
triggers:
- topic: frigate/events
trigger: mqtt
conditions:
- condition: template
value_template: |
{{ trigger.payload_json['after']['label'] == 'person' }}
Only triggers when Frigate detects a person – an initial filter step to avoid spamming the LLM.
2. LLM Integration
actions:
- action: ai_task.generate_data
response_variable: shipper_count
data:
instructions: >-
You are an image analysis assistant.
Task: Đếm số shipper (người giao hàng) đang đứng trước cổng trong hình.
Rules: - Chỉ dựa vào dấu hiệu nhận diện nghề nghiệp: đồng phục giao
hàng, mũ bảo hiểm có logo, áo khoác hãng vận chuyển, túi giao hàng,
thùng hàng, xe máy giao hàng, hành vi giao nhận. - Không nhận diện danh
tính cá nhân. - Chỉ đếm người có dấu hiệu rõ ràng là shipper. - Bỏ qua
người đi đường hoặc người không có dấu hiệu shipper. - Nếu cổng không rõ
hoặc góc máy không thấy cổng, chỉ đếm shipper trong khu vực trước
camera. - Trả về đúng một dòng tiếng Việt theo format: "shipper_count:
<number>" Output: Chỉ một dòng duy nhất, không xuống dòng khác.
entity_id: ai_task.gemini_flash
attachments:
media_content_id: media-source://camera/camera.gate_camera
media_content_type: application/vnd.apple.mpegurl
metadata:
title: Gate Camera
thumbnail: /api/camera_proxy/camera.gate_camera
media_class: video
children_media_class: null
navigateIds:
- {}
- media_content_type: app
media_content_id: media-source://camera
task_name: Camera AI
Key points:
- Prompt Engineering: Detailed instructions about visual cues to identify shippers
- Privacy: Instruct the LLM not to recognize personal identities
- Structured Output: Standard format "shipper_count:" for easy parsing
- Media Attachment: Directly sending the stream from the camera
3. Data Processing & State Management
- if:
- condition: template
value_template: >-
{{ (shipper_count['data'] | regex_findall_index('([0-9]+)', 0) |
int(0)) > 0 }}
then:
- action: counter.set_value
data:
value: >-
{{ shipper_count['data'] | regex_findall_index('([0-9]+)', 0) |
int(0) }}
target:
entity_id:
- counter.shipper_count
- delay:
minutes: 5
- action: counter.reset
data: {}
target:
entity_id:
- counter.shipper_count
mode: single
- Regex Parsing: Extract the number from the LLM response
- Conditional Logic: Only update when there is at least one shipper (> 0)
- Auto Cleanup: Reset after 5 minutes
Advanced Use Cases
Multi-language Support
instructions: >-
Task: Count delivery persons at gate.
Output format: "shipper_count: <number>"
Language: English
Notification Integration
- action: notify.mobile_app
data:
title: "🛵 Shipper detected"
message: >-
{{ shipper_count['data'] | regex_findall_index('([0-9]+)', 0) }}
shipper(s) at the gate
History Tracking
- action: recorder.log
data:
message: >-
Shipper detected: {{ shipper_count['data'] }}
level: info
Performance Considerations
Cost Optimization
- Use Gemini Flash instead of Pro for real-time scenarios
- Implement rate limiting to avoid spamming LLM calls
- Cache results for similar frames
Latency Management
- Pre-process with local AI before sending to the LLM
- Use streaming responses for faster feedback
- Implement timeout mechanisms
Security & Privacy
- Local processing: All images are processed locally before being sent to the LLM
- Data minimization: Only send frames where a person is detected
- Anonymity: The LLM is instructed not to identify personal identities
- Storage policy: Automatically delete media after processing
Debugging
LLM Response Format
# Test template
{{ shipper_count['data'] }}
# Should return: "shipper_count: 2"
Camera Integration
# Verify camera stream
media_content_id: media-source://camera/camera.gate_camera
Regex Extraction
# Debug regex
{{ shipper_count['data'] | regex_findall_index('([0-9]+)', 0) }}
Future Advanced Features
- Multi-object detection: Extend detection to postal workers, delivery trucks, etc.
- Behavior analysis: Analyze behaviors (waiting, delivering, leaving, etc.)
- Smart lock integration: Automatically open the gate when a shipper is confirmed
- Voice notifications: Announce via voice notifications
Conclusion
Combining LLMs with Home Assistant turns a smart home from "reactive" to "proactive" – not only reacting but also understanding and predicting needs. From the simple task of counting shippers to more complex applications like behavior analysis, LLMs bring "real intelligence" to automation systems.
Have you implemented similar AI-powered automations? Share your experience!