How the agent
is learning today.
FeedRight runs a Double-DQN policy across every cage, picking from six feed actions every three hours. Below is the live read on its decisions, reward signal, and how close it is to its next adaptive retrain.
- Architecture
- 512 · 256 · 128 · 64
- Parameters
- ~200K
- Replay buffer
- 50,000 transitions
- Exploration ε
- 0.05 (final)
- Discount γ
- 0.99
Reward trajectory
Where the policy spent its mass
- Projected uplift
- +0.45 reward
- ETA at current rate
- ~6 days
- Last validated
- +0.6 vs prior
Latest decisions
Optimal feeding window — high motion intensity, oxygen above target band.
Reduced amount — oxygen saturation 72 percent, declining trend.
Feed blocked — DO at 4.3 mg/L breaches the 4.5 mg/L safety floor.
High activity, excellent water quality, strong feeding response in last cycle.
Amount halved — temperature 30.5°C and waste rate trending up.
Standard cycle — moderate hunger indicators, conditions in nominal band.
Feed blocked — minimum interval 1.5h not met (1.2h since last feed).
Scheduled cycle, normal conditions, adequate digestion window.