Operating cage
Waiting
Autonomous · AI decides and actuates feedingsb3-dqn · v3.2.11,842 experiences bufferedNext snapshot in 1800s

Avg reward (24h)

1.55

+0.32 vs prev

Consumption

91%

target ≥ 90%

FCR (7d)

1.41

−0.06 vs baseline

Daily feed

18.5 kg

−2.1 kg waste

Decisions (24h)

142

6 overrides

Cost saved (wk)

$1,840

projected

Live AI decision · Pen L2
Next decision in 14 min

Selected action · idx 3

Medium · 2.0 kg

High feeding-frenzy score (0.81) · DO normal · Within daily cap

Confidence

87%

No safety override

Action-value distribution (Q)

a0 · Hold

0.0 kg

Q -0.62

a1 · Probe

0.5 kg

Q +0.04

a2 · Modest

1.0 kg

Q +0.41

a3 · Standard

2.0 kg

Q +0.78

chosen

a4 · Hungry

3.5 kg

Q +0.55

a5 · Optimal

5.0 kg

Q +0.13

Environmental

DO 6.4 mg/L

Healthy oxygen, no throttle.

Vision

Frenzy 0.81 / Motion 78%

Strong visual hunger signal · supports feeding.

Safety gate

PASS

7 / 7 hard rules · 3 / 3 soft rules.

Observation vector · 44 features
Box(0,1) · float32

Inputs to the DQN policy, normalized to [0,1] before inference.

Environmental13 dims
dissolved_oxygen6.40mg/L
temperature27.10°C
salinity33.20ppt
oxygen_saturation88%
turbidity12NTU
wind_speed4.10m/s
wind_dir142°
cloud_cover35%
temp_change_1h0.30°C
oxygen_trend_3h-0.05mg/L
salinity_trend0.10
time_of_day9.70h
season_idx0
Biomass6 dims
total_biomass6985kg
fish_count3520
avg_weight1984g
avg_length38.20cm
days_in_cage184d
species_idx0
Computer Vision5 dims
motion_intensity78%
feeding_frenzy_score0.81
surface_activity0.62
school_density18.40
pellets_visible145
Feeding History9 dims
time_since_last_feed2.30h
last_feed_amount2kg
last_consumption0.96
cumulative_today18.50kg
feeds_today3
avg_interval_24h3.20h
pellet_size6mm
pellet_protein0.45
feed_density1.30
Performance5 dims
fcr_7d1.41
sgr_7d1.80
mortality_30d0.40%
feed_waste_rate0.08
reward_avg_24h1.55
Cage6 dims
depth12m
volume2200
surface_area18m
stocking_density1.20
mesh_age47d
cage_age184d
Reward signal · last 24 h
Excellent +3.0 · Good +1.5 · Wasteful −2.0 · Dangerous −4.0

Mean reward

+1.62

Best decision

+3.0

Worst decision

−3.5

Adaptive retraining

Real-world experiences feed the SB3 replay buffer. Retraining triggers once the buffer reaches the minimum and is auto-deployed if the new policy beats the live one.

Experience buffer

1,842 / 1,000

Ready to retrain

Live policy

+1.18

Candidate policy

+1.84

+56% reward

Model history
  • v3.2.1· May 8+1.84deployed
  • v3.2.0· Apr 24+1.62deployed
  • v3.1.4· Apr 03+1.41deployed
  • v3.1.3· Mar 19+1.18rejected
  • v3.1.2· Feb 28+1.05deployed
Safety constraints
10 / 10 passing

Hard rules · block feeding entirely

Critical O₂pass

DO ≥ 4.5 mg/L

Now: 6.4 mg/L

O₂ saturationpass

Sat ≥ 65%

Now: 88%

Heat stresspass

Temp ≤ 31 °C

Now: 27.1 °C

Cold stresspass

Temp ≥ 23 °C

Now: 27.1 °C

Daily cappass

Feeds < 6 / day

Now: 3 / 6

Digestion windowpass

≥ 1.5 h since feed

Now: 2.3 h

Wind exposurepass

Wind < 15 m/s

Now: 4.1 m/s

Soft rules · throttle the amount

Low O₂ throttlepass

Cap to 30% if DO < 5.5

Now: 6.4 mg/L

Rapid temp changepass

Cap to 60% if Δ>1.5°C/h

Now: 0.3 °C/h

High wastepass

Cap if waste > 20%

Now: 8%

Recent decisions
Last 8 events · all cages
TimeCageRaw → AppliedConsumedRewardReason
09:42L12.0 kg 96%+3.0Excellent timing · DO 7.2
09:38L43.5 kg 0.0 kg-3.5DO 4.3 mg/L < 4.5 threshold
09:30L33.5 kg 93%+1.5High frenzy · Heavy feed
09:22L22.0 kg 1.0 kg78%+0.4Throttled (DO trending down)
09:15L11.0 kg 99%+3.0Low-amount probe successful
09:08L30.5 kg 95%+1.5Appetite probe
09:01L40.0 kg +0.5AI chose to wait — DO trending down
08:54L22.0 kg 91%+1.5Within tolerance
Fleet overview