Once you have created an environment and reinforcement learning agent, you can train the agent in the environment using the Show
opt = rlTrainingOptions(... 'MaxEpisodes',1000,... 'MaxStepsPerEpisode',1000,... 'StopTrainingCriteria',"AverageReward",... 'StopTrainingValue',480); trainResults = train(agent,env,opt); If For more information on creating agents, see Reinforcement Learning Agents. For more information on creating environments, see Create MATLAB Reinforcement Learning Environments and Create Simulink Reinforcement Learning Environments. Note
save("initialAgent.mat","agent") If you copy the agent into a new variable, the new variable will also always point to the most recent agent version with updated parameters. For more information about handle objects, see Handle Object Behavior. Training
terminates automatically when the conditions you specify in the When training terminates the training statistics and results are stored in the Because trainResults = train(agent,env,trainResults);This starts the training from the last values of the agent parameters and training results object obtained after the previous train call.The For example, disable displaying the training progress on Episode Manager, enable the trainResults.TrainingOptions.MaxEpisodes = 2000;
trainResults.TrainingOptions.Plots = "none";
trainResults.TrainingOptions.Verbose = 1;
trainResultsNew = train(agent,env,trainResults); Note When training terminates, Training AlgorithmIn general, training performs the following steps.
The specifics of how the software performs these steps depend on the configuration of the agent and environment. For instance, resetting the environment at the start of each episode can include randomizing initial state values, if you configure your environment to do so. For more information on agents and their training algorithms, see Reinforcement Learning Agents. To use parallel processing and GPUs to speed up training, see Train Agents Using Parallel Computing and GPUs. Episode ManagerBy default, calling the The Episode Manager plot shows the reward for each episode (EpisodeReward) and a running average reward value (AverageReward). For agents with a critic, Episode Q0 is the estimate of the discounted long-term reward at the start of each episode, given the initial observation of the environment. As training progresses,
if the critic is well designed and learns successfully, Episode Q0 approaches in average the true discounted long-term reward, which may be offset from the EpisodeReward value because of discounting. For a well designed critic using an undiscounted reward ( The Episode Manager also displays various episode and
training statistics. You can also use the Save Candidate AgentsDuring training, you can save candidate agents that meet conditions you specify in the opt = rlTrainingOptions('SaveAgentCriteria',"EpisodeReward",'SaveAgentValue',100');
After training is complete, you can save the final trained agent from the MATLAB workspace using the save(opt.SaveAgentDirectory + "/finalAgent.mat",'agent') By default, when DDPG and DQN agents are saved, the experience buffer data is not saved. If you plan to further train your
saved agent, you can start training with the previous experience buffer as a starting point. In this case, set the Validate Trained PolicyTo validate your trained agent, you can simulate the agent within the training environment using the When validating your agent, consider checking how your agent handles the following:
As with parallel training, if you have Parallel Computing Toolbox™ software, you can run multiple parallel simulations on multicore computers. If you have MATLAB Parallel Server™ software, you can run multiple parallel simulations on computer clusters or cloud resources. For more information on configuring your simulation to use
parallel computing, see Environment VisualizationIf your training environment implements the Environment visualization is not supported when training or simulating your agent using parallel computing. For custom environments, you must implement your own See Also
Related Topics
What does the scenario manager allow you to do?What is the scenario manager in Excel? Excel's scenario manager is a collection of digital tools that allow a user to create, analyze and compare data results in different business situations. You can store multiple versions of data within the same cell, then change them depending on a scenario's goal.
How do you create a scenario in Excel to save original values?Click the Collapse Dialog button next to the Changing Cells field and select the cells that contain the values you want to change. Type the new scenario values in the dialog boxs fields. Click OK to create the scenario, or click Add to save your work and create a new scenario that changes the same cells.
How does scenario manager work in Excel?A Scenario is a set of values that Excel saves and can substitute automatically on your worksheet. You can create and save different groups of values as scenarios and then switch between these scenarios to view the different results.
When would you use the scenario Manager Data Analysis feature?Scenario Manager in Excel is used to compare data side by side and also swap multiple sets of data within a worksheet. In simple words when you have multiple variables and you want to see their effect on the final result, and also want to estimate between two or more desired budgets you can use Scenario Manager.
|