Adaptive Learning
How FeedRight continuously improves through real-world experience collection, persistent storage, and periodic model retraining.
The initial DQN model is pretrained in simulation. Adaptive learning closes the sim-to-real gap by collecting actual feeding outcomes from deployed cages and periodically retraining the model on this real-world data.
System Components
| Component | Module | Role |
|---|---|---|
| ExperienceDatabase | lib/dqn_sb3/experience_db.py | Thread-safe SQLite storage for (s, a, r, s', done) tuples |
| AdaptiveRetrainingService | lib/dqn_sb3/adaptive_retrain.py | Orchestrates data loading, replay-buffer prefilling, training, evaluation and deployment |
| AdaptiveLearningMonitor | lib/dqn_sb3/monitoring.py | Dashboards, trend plots, and JSON report export |
Experience Collection Flow
CageFeedingAgent ExperienceDatabase
│ │
│ decide_feeding(state) │
│ ─► stores _last_state, _last_action│
│ │
│ ... cage feeds fish ... │
│ ... outcome measured ... │
│ │
│ record_outcome(reward, next_state) │
│──────────────────────────────────►│
│ INSERT INTO experiences │
│ (cage_id, state, action, │
│ feed_amount, reward, │
│ next_state, done, metadata) │
│ │When to call record_outcome
The system runs a cron job that captures a data snapshot from all connected datasources every 30 minutes. The agent uses this snapshot to:
- Measure actual consumption (e.g., via pellet waste sensors or video analysis).
- Compute a reward from real metrics (actual FCR, actual waste rate, fish health indicators).
- Capture the new state vector based on the snapshot.
- Call
agent.record_outcome(reward=..., next_state=..., metadata={...}).
Database Schema
experiences table
| Column | Type | Description |
|---|---|---|
id | INTEGER PK | Auto-incrementing ID |
cage_id | TEXT | Cage identifier (e.g., "CAGE-001") |
timestamp | DATETIME | When the experience occurred |
state | TEXT (JSON) | 44-element state vector serialised as JSON array |
action | INTEGER | Discrete action index (0–5) |
feed_amount | REAL | Actual kg dispensed |
reward | REAL | Observed reward |
next_state | TEXT (JSON) | Next state vector (nullable) |
done | INTEGER | Episode termination flag |
metadata | TEXT (JSON) | Extra info (safety overrides, confidence, weather) |
Indexed on (cage_id, timestamp), timestamp, and cage_id for fast range queries.
model_performance table
Tracks evaluation metrics per model version over time.
retraining_history table
Logs every retraining attempt with status (started, data_loaded, completed, failed), performance improvement, error messages, and configuration.
Retraining Pipeline
The AdaptiveRetrainingService.retrain_model() method executes the following steps:
1. Check readiness
└─ total experiences ≥ min_experiences (default 1 000)?
2. Load real data
└─ ExperienceDatabase.get_training_dataset()
└─ Optional filtering: cage_id, start_date, min_reward
└─ Max 50 000 samples to cap memory
3. Evaluate baseline model
└─ evaluate_policy(current_model, env, n=20)
└─ Records mean ± std reward
4. Pre-fill replay buffer
└─ Iterate real (s, a, r, s', done) tuples
└─ replay_buffer.add() for each experience
└─ Buffer now mixes real + simulated data
5. Continue training
└─ model.learn(additional_timesteps=50 000)
└─ reset_num_timesteps=False (continues step count)
6. Evaluate retrained model
└─ evaluate_policy(retrained_model, env, n=20)
└─ Compute improvement = retrained - baseline
7. Save & log
└─ Save to models/retrained/model_{cage}_{timestamp}.zip
└─ Log results to retraining_history tableAuto-Retrain Schedule
auto_retrain_schedule() is designed to be called periodically (e.g., weekly via cron):
result = service.auto_retrain_schedule(
current_model_path="./models/best_model/best_model.zip",
check_every_days=7,
min_new_experiences=500,
improvement_threshold=0.5
)The flow:
- Count experiences from the last
check_every_days. - If fewer than
min_new_experiences, skip. - Run full retraining pipeline.
- If
improvement < improvement_threshold, save but do not deploy. - If improvement meets threshold:
- Backup the current production model with a timestamp suffix.
- Copy the retrained model to the production path.
- Return
deployed: true.
This conservative deployment strategy ensures only strictly better models reach production.
Monitoring
AdaptiveLearningMonitor provides:
| Method | Output |
|---|---|
print_summary() | Text report: total experiences, avg reward, retraining readiness |
plot_reward_trends() | Scatter + moving average of rewards over time |
plot_action_distribution() | Bar chart of action frequency + histogram of feed amounts |
plot_learning_progress() | Cumulative average reward + windowed moving average |
export_report() | JSON file with all metrics, action distribution, and readiness status |
All plotting methods accept an optional cage_id filter and a save_path for headless environments.
Key Design Decisions
- SQLite over cloud DB: Keeps the system self-contained and deployable on edge hardware at the farm site. Thread-safety is handled via
threading.local()connections. - Pre-fill, don't replace: Real experiences are mixed into the replay buffer alongside simulated ones rather than replacing them. This prevents catastrophic forgetting of the pretrained policy.
- Conservative deployment: A positive
improvement_thresholdprevents regression. The backup mechanism allows instant rollback. - Per-cage filtering: The database supports filtering by
cage_id, enabling future per-cage specialised models while sharing a single database.