Adaptive Learning

How FeedRight continuously improves through real-world experience collection, persistent storage, and periodic model retraining.

The initial DQN model is pretrained in simulation. Adaptive learning closes the sim-to-real gap by collecting actual feeding outcomes from deployed cages and periodically retraining the model on this real-world data.

System Components

Component	Module	Role
ExperienceDatabase	`lib/dqn_sb3/experience_db.py`	Thread-safe SQLite storage for `(s, a, r, s', done)` tuples
AdaptiveRetrainingService	`lib/dqn_sb3/adaptive_retrain.py`	Orchestrates data loading, replay-buffer prefilling, training, evaluation and deployment
AdaptiveLearningMonitor	`lib/dqn_sb3/monitoring.py`	Dashboards, trend plots, and JSON report export

Experience Collection Flow

CageFeedingAgent                    ExperienceDatabase
       │                                   │
       │ decide_feeding(state)              │
       │ ─► stores _last_state, _last_action│
       │                                   │
       │       ... cage feeds fish ...     │
       │       ... outcome measured ...     │
       │                                   │
       │ record_outcome(reward, next_state) │
       │──────────────────────────────────►│
       │     INSERT INTO experiences        │
       │     (cage_id, state, action,       │
       │      feed_amount, reward,          │
       │      next_state, done, metadata)   │
       │                                   │

When to call `record_outcome`

The system runs a cron job that captures a data snapshot from all connected datasources every 30 minutes. The agent uses this snapshot to:

Measure actual consumption (e.g., via pellet waste sensors or video analysis).
Compute a reward from real metrics (actual FCR, actual waste rate, fish health indicators).
Capture the new state vector based on the snapshot.
Call agent.record_outcome(reward=..., next_state=..., metadata={...}).

Database Schema

`experiences` table

Column	Type	Description
`id`	INTEGER PK	Auto-incrementing ID
`cage_id`	TEXT	Cage identifier (e.g., `"CAGE-001"`)
`timestamp`	DATETIME	When the experience occurred
`state`	TEXT (JSON)	44-element state vector serialised as JSON array
`action`	INTEGER	Discrete action index (0–5)
`feed_amount`	REAL	Actual kg dispensed
`reward`	REAL	Observed reward
`next_state`	TEXT (JSON)	Next state vector (nullable)
`done`	INTEGER	Episode termination flag
`metadata`	TEXT (JSON)	Extra info (safety overrides, confidence, weather)

Indexed on (cage_id, timestamp), timestamp, and cage_id for fast range queries.

`model_performance` table

Tracks evaluation metrics per model version over time.

`retraining_history` table

Logs every retraining attempt with status (started, data_loaded, completed, failed), performance improvement, error messages, and configuration.

Retraining Pipeline

The AdaptiveRetrainingService.retrain_model() method executes the following steps:

1. Check readiness
   └─ total experiences ≥ min_experiences (default 1 000)?

2. Load real data
   └─ ExperienceDatabase.get_training_dataset()
   └─ Optional filtering: cage_id, start_date, min_reward
   └─ Max 50 000 samples to cap memory

3. Evaluate baseline model
   └─ evaluate_policy(current_model, env, n=20)
   └─ Records mean ± std reward

4. Pre-fill replay buffer
   └─ Iterate real (s, a, r, s', done) tuples
   └─ replay_buffer.add() for each experience
   └─ Buffer now mixes real + simulated data

5. Continue training
   └─ model.learn(additional_timesteps=50 000)
   └─ reset_num_timesteps=False (continues step count)

6. Evaluate retrained model
   └─ evaluate_policy(retrained_model, env, n=20)
   └─ Compute improvement = retrained - baseline

7. Save & log
   └─ Save to models/retrained/model_{cage}_{timestamp}.zip
   └─ Log results to retraining_history table

Auto-Retrain Schedule

auto_retrain_schedule() is designed to be called periodically (e.g., weekly via cron):

result = service.auto_retrain_schedule(
    current_model_path="./models/best_model/best_model.zip",
    check_every_days=7,
    min_new_experiences=500,
    improvement_threshold=0.5
)

The flow:

Count experiences from the last check_every_days.
If fewer than min_new_experiences, skip.
Run full retraining pipeline.
If improvement < improvement_threshold, save but do not deploy.
If improvement meets threshold:
- Backup the current production model with a timestamp suffix.
- Copy the retrained model to the production path.
- Return deployed: true.

This conservative deployment strategy ensures only strictly better models reach production.

Monitoring

AdaptiveLearningMonitor provides:

Method	Output
`print_summary()`	Text report: total experiences, avg reward, retraining readiness
`plot_reward_trends()`	Scatter + moving average of rewards over time
`plot_action_distribution()`	Bar chart of action frequency + histogram of feed amounts
`plot_learning_progress()`	Cumulative average reward + windowed moving average
`export_report()`	JSON file with all metrics, action distribution, and readiness status

All plotting methods accept an optional cage_id filter and a save_path for headless environments.

Key Design Decisions

SQLite over cloud DB: Keeps the system self-contained and deployable on edge hardware at the farm site. Thread-safety is handled via threading.local() connections.
Pre-fill, don't replace: Real experiences are mixed into the replay buffer alongside simulated ones rather than replacing them. This prevents catastrophic forgetting of the pretrained policy.
Conservative deployment: A positive improvement_threshold prevents regression. The backup mechanism allows instant rollback.
Per-cage filtering: The database supports filtering by cage_id, enabling future per-cage specialised models while sharing a single database.

Adaptive Learning

On this page