Feed Right Docs

Adaptive Learning

How FeedRight continuously improves through real-world experience collection, persistent storage, and periodic model retraining.

The initial DQN model is pretrained in simulation. Adaptive learning closes the sim-to-real gap by collecting actual feeding outcomes from deployed cages and periodically retraining the model on this real-world data.

System Components

ComponentModuleRole
ExperienceDatabaselib/dqn_sb3/experience_db.pyThread-safe SQLite storage for (s, a, r, s', done) tuples
AdaptiveRetrainingServicelib/dqn_sb3/adaptive_retrain.pyOrchestrates data loading, replay-buffer prefilling, training, evaluation and deployment
AdaptiveLearningMonitorlib/dqn_sb3/monitoring.pyDashboards, trend plots, and JSON report export

Experience Collection Flow

CageFeedingAgent                    ExperienceDatabase
       │                                   │
       │ decide_feeding(state)              │
       │ ─► stores _last_state, _last_action│
       │                                   │
       │       ... cage feeds fish ...     │
       │       ... outcome measured ...     │
       │                                   │
       │ record_outcome(reward, next_state) │
       │──────────────────────────────────►│
       │     INSERT INTO experiences        │
       │     (cage_id, state, action,       │
       │      feed_amount, reward,          │
       │      next_state, done, metadata)   │
       │                                   │

When to call record_outcome

The system runs a cron job that captures a data snapshot from all connected datasources every 30 minutes. The agent uses this snapshot to:

  1. Measure actual consumption (e.g., via pellet waste sensors or video analysis).
  2. Compute a reward from real metrics (actual FCR, actual waste rate, fish health indicators).
  3. Capture the new state vector based on the snapshot.
  4. Call agent.record_outcome(reward=..., next_state=..., metadata={...}).

Database Schema

experiences table

ColumnTypeDescription
idINTEGER PKAuto-incrementing ID
cage_idTEXTCage identifier (e.g., "CAGE-001")
timestampDATETIMEWhen the experience occurred
stateTEXT (JSON)44-element state vector serialised as JSON array
actionINTEGERDiscrete action index (0–5)
feed_amountREALActual kg dispensed
rewardREALObserved reward
next_stateTEXT (JSON)Next state vector (nullable)
doneINTEGEREpisode termination flag
metadataTEXT (JSON)Extra info (safety overrides, confidence, weather)

Indexed on (cage_id, timestamp), timestamp, and cage_id for fast range queries.

model_performance table

Tracks evaluation metrics per model version over time.

retraining_history table

Logs every retraining attempt with status (started, data_loaded, completed, failed), performance improvement, error messages, and configuration.

Retraining Pipeline

The AdaptiveRetrainingService.retrain_model() method executes the following steps:

1. Check readiness
   └─ total experiences ≥ min_experiences (default 1 000)?

2. Load real data
   └─ ExperienceDatabase.get_training_dataset()
   └─ Optional filtering: cage_id, start_date, min_reward
   └─ Max 50 000 samples to cap memory

3. Evaluate baseline model
   └─ evaluate_policy(current_model, env, n=20)
   └─ Records mean ± std reward

4. Pre-fill replay buffer
   └─ Iterate real (s, a, r, s', done) tuples
   └─ replay_buffer.add() for each experience
   └─ Buffer now mixes real + simulated data

5. Continue training
   └─ model.learn(additional_timesteps=50 000)
   └─ reset_num_timesteps=False (continues step count)

6. Evaluate retrained model
   └─ evaluate_policy(retrained_model, env, n=20)
   └─ Compute improvement = retrained - baseline

7. Save & log
   └─ Save to models/retrained/model_{cage}_{timestamp}.zip
   └─ Log results to retraining_history table

Auto-Retrain Schedule

auto_retrain_schedule() is designed to be called periodically (e.g., weekly via cron):

result = service.auto_retrain_schedule(
    current_model_path="./models/best_model/best_model.zip",
    check_every_days=7,
    min_new_experiences=500,
    improvement_threshold=0.5
)

The flow:

  1. Count experiences from the last check_every_days.
  2. If fewer than min_new_experiences, skip.
  3. Run full retraining pipeline.
  4. If improvement < improvement_threshold, save but do not deploy.
  5. If improvement meets threshold:
    • Backup the current production model with a timestamp suffix.
    • Copy the retrained model to the production path.
    • Return deployed: true.

This conservative deployment strategy ensures only strictly better models reach production.

Monitoring

AdaptiveLearningMonitor provides:

MethodOutput
print_summary()Text report: total experiences, avg reward, retraining readiness
plot_reward_trends()Scatter + moving average of rewards over time
plot_action_distribution()Bar chart of action frequency + histogram of feed amounts
plot_learning_progress()Cumulative average reward + windowed moving average
export_report()JSON file with all metrics, action distribution, and readiness status

All plotting methods accept an optional cage_id filter and a save_path for headless environments.

Key Design Decisions

  1. SQLite over cloud DB: Keeps the system self-contained and deployable on edge hardware at the farm site. Thread-safety is handled via threading.local() connections.
  2. Pre-fill, don't replace: Real experiences are mixed into the replay buffer alongside simulated ones rather than replacing them. This prevents catastrophic forgetting of the pretrained policy.
  3. Conservative deployment: A positive improvement_threshold prevents regression. The backup mechanism allows instant rollback.
  4. Per-cage filtering: The database supports filtering by cage_id, enabling future per-cage specialised models while sharing a single database.

On this page