Agent Console
Model dqn-feeding · v1.4Last retrain · 18 days ago3 cages live
Inference healthySun, May 24
Reinforcement learning · Adaptive feeding

How the agent
is learning today.

FeedRight runs a Double-DQN policy across every cage, picking from six feed actions every three hours. Below is the live read on its decisions, reward signal, and how close it is to its next adaptive retrain.

Policy snapshot
Architecture
512 · 256 · 128 · 64
Parameters
~200K
Replay buffer
50,000 transitions
Exploration ε
0.05 (final)
Discount γ
0.99
Avg reward · 24h
+1.02
vs +0.92 prior day
Decisions · 24h
8
across 3 cages
Avg confidence
65%
softmax over Q-values
Optimal feed rate
50%
4 of 8 cycles
Safety overrides
4
2 reduced · 2 skipped
Inference latency
42 ms
p95 last hour
Reward signal · last 24h

Reward trajectory

24h mean
1.04
24h peak
2.80
24h trough
-1.80
Action distribution · 7d

Where the policy spent its mass

Skip · 0.0 kg9 · 13%
Light · 0.5 kg4 · 6%
Modest · 1.0 kg8 · 12%
Standard · 2.0 kg22 · 32%
Heavy · 3.5 kg19 · 27%
Maximum · 5.0 kg7 · 10%
Adaptive retrain
75%
753 of 1,000 experiences
Projected uplift
+0.45 reward
ETA at current rate
~6 days
Last validated
+0.6 vs prior
Decision log

Latest decisions

8 cyclesEvery 3 hours · per cage
Cage M1Fed
08:42 · 3h ago

Optimal feeding window — high motion intensity, oxygen above target band.

DO
7.2 mg/L
Temp
28.5°C
O₂ sat
92%
Wind
5.2 m/s
Waste
12%
Δ feed
3.2 h
Today
2 feeds
2.5 kg
Confidence
87%
Reward
+2.40
Q-value
7.18
Cage M2Reduced
05:42 · 6h ago
Safety override applied

Reduced amount — oxygen saturation 72 percent, declining trend.

DO
5.8 mg/L
Temp
29.2°C
O₂ sat
72%
Wind
6.1 m/s
Waste
18%
Δ feed
3.5 h
Today
2 feeds
1.5 kg
3.2 kg policy output
Confidence
54%
Reward
+0.60
Q-value
4.12
Cage M3Skipped
02:42 · 9h ago
Safety override applied

Feed blocked — DO at 4.3 mg/L breaches the 4.5 mg/L safety floor.

DO
4.3 mg/L
Temp
30.1°C
O₂ sat
68%
Wind
8.5 m/s
Waste
25%
Δ feed
2.8 h
Today
3 feeds
No feed
2.8 kg policy output
Confidence
42%
Reward
-1.80
Q-value
2.05
Cage M1Fed
23:42 · 12h ago

High activity, excellent water quality, strong feeding response in last cycle.

DO
7.8 mg/L
Temp
27.9°C
O₂ sat
95%
Wind
4.2 m/s
Waste
8%
Δ feed
3.8 h
Today
1 feeds
3.1 kg
Confidence
91%
Reward
+2.80
Q-value
7.95
Cage M2Reduced
20:42 · 15h ago
Safety override applied

Amount halved — temperature 30.5°C and waste rate trending up.

DO
6.2 mg/L
Temp
30.5°C
O₂ sat
78%
Wind
7.8 m/s
Waste
22%
Δ feed
3.1 h
Today
3 feeds
1.8 kg
3.5 kg policy output
Confidence
48%
Reward
+0.40
Q-value
3.61
Cage M3Fed
17:42 · 18h ago

Standard cycle — moderate hunger indicators, conditions in nominal band.

DO
7.0 mg/L
Temp
28.2°C
O₂ sat
88%
Wind
5.8 m/s
Waste
14%
Δ feed
4.1 h
Today
2 feeds
2.8 kg
Confidence
83%
Reward
+2.10
Q-value
6.82
Cage M1Skipped
14:42 · 21h ago
Safety override applied

Feed blocked — minimum interval 1.5h not met (1.2h since last feed).

DO
6.8 mg/L
Temp
29.0°C
O₂ sat
84%
Wind
6.5 m/s
Waste
16%
Δ feed
1.2 h
Today
4 feeds
No feed
2.2 kg policy output
Confidence
38%
Reward
-0.20
Q-value
1.94
Cage M2Fed
11:42 · 24h ago

Scheduled cycle, normal conditions, adequate digestion window.

DO
7.4 mg/L
Temp
28.0°C
O₂ sat
90%
Wind
5.0 m/s
Waste
11%
Δ feed
3.6 h
Today
1 feeds
2.2 kg
Confidence
79%
Reward
+1.90
Q-value
6.21
Active guardrails

Hard constraints layered on the policy

Source · agent.py · safety_constraints
Dissolved oxygen floor
< 4.5 mg/L blocks feed
Temperature window
feed only 23–31 °C
Daily cap
≥ 6 feeds/day blocks feed
Digestion interval
< 1.5 h since last feed blocks
Wind threshold
> 15 m/s blocks feed
Saturation floor
O₂ sat < 65% blocks feed