COMP9414代寫、Python語言編程代做 - 昆明網 kmw.cc

<strike id="bfrlb"></strike><form id="bfrlb"><form id="bfrlb"><nobr id="bfrlb"></nobr></form></form>

<sub id="bfrlb"><listing id="bfrlb"><menuitem id="bfrlb"></menuitem></listing></sub>

<form id="bfrlb"></form>

<form id="bfrlb"></form>

<address id="bfrlb"></address>

<address id="bfrlb"></address>

昆明精選生活信息陽宗海宜良富民五華滇池安寧市麗江大理西雙版納楚雄普爾市迪慶磨憨-磨丁經濟合作區企業信息企業推廣網站推廣外鏈推廣法律機構法律案例法律文書云南法規昆明社保查詢昆明醫保問題解答辦事指南工傷保險失業保險昆明房產信息昆明市公共租賃住房昆明市不動產辦理昆明公積金交通信息昆明ETC 交通服務昆明地鐵昆明機場昆明公交旅游服務昆明旅游景點昆明旅游線路昆明旅游攻略云南介紹云南旅游攻略云南自駕游攻略昆明美食特色餐廳云南美食教育信息教育機構昆明義務教育昆明中考中職高職昆明高考昆明學校招聘信息公務員招錄就業服務就業政策昆明招聘網站昆明公共服務電話昆明政府電話便民服務婚姻登記昆明供水戶口居住證昆明護照昆明陵園昆明燃氣社會救助老人福利教育培訓美容服飾機械電子網絡科技健康保健企業市場社會娛樂百科外鏈推廣藥品網保健購物商城武漢網重慶網合肥網

昆明云南新聞昆明法律昆明社保昆明房產昆明交通昆明旅游昆明美食昆明教育昆明招聘昆明醫院文化藝術企業服務昆明電話

網文薈萃教育培訓美容服飾機械電子網絡科技健康保健企業市場社會娛樂百科外鏈推廣

COMP9414代寫、Python語言編程代做

時間：2024-07-06 來源：作者：我要糾錯

COMP9414 24T2
Artificial Intelligence
Assignment 2 - Reinforcement Learning
Due: Week 9, Wednesday, 26 July 2024, 11:55 PM.
1 Problem context
Taxi Navigation with Reinforcement Learning: In this assignment,
you are asked to implement Q-learning and SARSA methods for a taxi nav-
igation problem. To run your experiments and test your code, you should
make use of the Gym library1, an open-source Python library for developing
and comparing reinforcement learning algorithms. You can install Gym on
your computer simply by using the following command in your command
prompt:
pip i n s t a l l gym
In the taxi navigation problem, there are four designated locations in the
grid world indicated by R(ed), G(reen), Y(ellow), and B(lue). When the
episode starts, one taxi starts off at a random square and the passenger is
at a random location (one of the four specified locations). The taxi drives
to the passenger’s location, picks up the passenger, drives to the passenger’s
destination (another one of the four specified locations), and then drops off
the passenger. Once the passenger is dropped off, the episode ends. To show
the taxi grid world environment, you can use the following code:

env = gym .make(”Taxi?v3 ” , render mode=”ans i ” ) . env
s t a t e = env . r e s e t ( )
rendered env = env . render ( )
p r i n t ( rendered env )
In order to render the environment, there are three modes known as
“human”, “rgb array, and “ansi”. The “human” mode visualizes the envi-
ronment in a way suitable for human viewing, and the output is a graphical
window that displays the current state of the environment (see Fig. 1). The
“rgb array” mode provides the environment’s state as an RGB image, and
the output is a numpy array representing the RGB image of the environment.
The “ansi” mode provides a text-based representation of the environment’s
state, and the output is a string that represents the current state of the
environment using ASCII characters (see Fig. 2).
Figure 1: “human” mode presentation for the taxi navigation problem in
Gym library.
You are free to choose the presentation mode between “human” and
“ansi”, but for simplicity, we recommend “ansi” mode. Based on the given
description, there are six discrete deterministic actions that are presented in
Table 1.
For this assignment, you need to implement the Q-learning and SARSA
algorithms for the taxi navigation environment. The main objective for this
assignment is for the agent (taxi) to learn how to navigate the gird-world
and drive the passenger with the minimum possible steps. To accomplish
the learning task, you should empirically determine hyperparameters, e.g.,
the learning rate α, exploration parameters (such as ? or T ), and discount
factor γ for your algorithm. Your agent should be penalized -1 per step it
2
Figure 2: “ansi” mode presentation for the taxi navigation problem in Gym
library. Gold represents the taxi location, blue is the pickup location, and
purple is the drop-off location.
Table 1: Six possible actions in the taxi navigation environment.
Action Number of the action
Move South 0
Move North 1
Move East 2
Move West 3
Pickup Passenger 4
Drop off Passenger 5
takes, receive a +20 reward for delivering the passenger, and incur a -10
penalty for executing “pickup” and “drop-off” actions illegally. You should
try different exploration parameters to find the best value for exploration
and exploitation balance.
As an outcome, you should plot the accumulated reward per episode and
the number of steps taken by the agent in each episode for at least 1000
learning episodes for both the Q-learning and SARSA algorithms. Examples
of these two plots are shown in Figures 3–6. Please note that the provided
plots are just examples and, therefore, your plots will not be exactly like the
provided ones, as the learning parameters will differ for your algorithm.
After training your algorithm, you should save your Q-values. Based on
your saved Q-table, your algorithms will be tested on at least 100 random
grid-world scenarios with the same characteristics as the taxi environment for
both the Q-learning and SARSA algorithms using the greedy action selection
3
Figure 3: Q-learning reward. Figure 4: Q-learning steps.
Figure 5: SARSA reward. Figure 6: SARSA steps.
method. Therefore, your Q-table will not be updated during testing for the
new steps.
Your code should be able to visualize the trained agent for both the Q-
learning and SARSA algorithms. This means you should render the “Taxi-
v3” environment (you can use the “ansi” mode) and run your trained agent
from a random position. You should present the steps your agent is taking
and how the reward changes from one state to another. An example of the
visualized agent is shown in Fig. 7, where only the first six steps of the taxi
are displayed.
2 Testing and discussing your code
As part of the assignment evaluation, your code will be tested by tutors
along with you in a discussion carried out in the tutorial session in week 10.
The assignment has a total of 25 marks. The discussion is mandatory and,
therefore, we will not mark any assignment not discussed with tutors.
Before your discussion session, you should prepare the necessary code for
this purpose by loading your Q-table and the “Taxi-v3” environment. You
should be able to calculate the average number of steps per episode and the
4
Figure 7: The first six steps of a trained agent (taxi) based on Q-learning
algorithm.
average accumulated reward (for a maximum of 100 steps for each episode)
for the test episodes (using the greedy action selection method).
You are expected to propose and build your algorithms for the taxi nav-
igation task. You will receive marks for each of these subsections as shown
in Table 2. Except for what has been mentioned in the previous section, it is
fine if you want to include any other outcome to highlight particular aspects
when testing and discussing your code with your tutor.
For both Q-learning and SARSA algorithms, your tutor will consider the
average accumulated reward and the average taken steps for the test episodes
in the environment for a maximum of 100 steps for each episode. For your Q-
learning algorithm, the agent should perform at most 13 steps per episode on
average and obtain a minimum of 7 average accumulated reward. Numbers
worse than that will result in a score of 0 marks for that specific section.
For your SARSA algorithm, the agent should perform at most 15 steps per
episode on average and obtain a minimum of 5 average accumulated reward.
Numbers worse than that will result in a score of 0 marks for that specific
section.
Finally, you will receive 1 mark for code readability for each task, and
your tutor will also give you a maximum of 5 marks for each task depending
on the level of code understanding as follows: 5. Outstanding, 4. Great,
3. Fair, 2. Low, 1. Deficient, 0. No answer.
5
Table 2: Marks for each task.
Task Marks
Results obtained from agent learning
Accumulated rewards and steps per episode plots for Q-learning
algorithm.
2 marks
Accumulated rewards and steps per episode plots for SARSA
algorithm.
2 marks
Results obtained from testing the trained agent
Average accumulated rewards and average steps per episode for
Q-learning algorithm.
2.5 marks
Average accumulated rewards and average steps per episode for
SARSA algorithm.
2.5 marks
Visualizing the trained agent for Q-learning algorithm. 2 marks
Visualizing the trained agent for SARSA algorithm. 2 marks
Code understanding and discussion
Code readability for Q-learning algorithm 1 mark
Code readability for SARSA algorithm 1 mark
Code understanding and discussion for Q-learning algorithm 5 mark
Code understanding and discussion for SARSA algorithm 5 mark
Total marks 25 marks
3 Submitting your assignment
The assignment must be done individually. You must submit your assignment
solution by Moodle. This will consist of a single .zip file, including three
files, the .ipynb Jupyter code, and your saved Q-tables for Q-learning and
SARSA (you can choose the format for the Q-tables). Remember your files
with your Q-tables will be called during your discussion session to run the
test episodes. Therefore, you should also provide a script in your Python
code at submission to perform these tests. Additionally, your code should
include short text descriptions to help markers better understand your code.
Please be mindful that providing clean and easy-to-read code is a part of
your assignment.
Please indicate your full name and your zID at the top of the file as a
comment. You can submit as many times as you like before the deadline –
later submissions overwrite earlier ones. After submitting your file a good
6
practice is to take a screenshot of it for future reference.
Late submission penalty: UNSW has a standard late submission
penalty of 5% per day from your mark, capped at five days from the as-
sessment deadline, after that students cannot submit the assignment.
4 Deadline and questions
Deadline: Week 9, Wednesday 24 of July 2024, 11:55pm. Please use the
forum on Moodle to ask questions related to the project. We will prioritise
questions asked in the forum. However, you should not share your code to
avoid making it public and possible plagiarism. If that’s the case, use the
course email cs9414@cse.unsw.edu.au as alternative.
Although we try to answer questions as quickly as possible, we might take
up to 1 or 2 business days to reply, therefore, last-moment questions might
not be answered timely.
For any questions regarding the discussion sessions, please contact directly
your tutor. You can have access to your tutor email address through Table
3.
5 Plagiarism policy
Your program must be entirely your own work. Plagiarism detection software
might be used to compare submissions pairwise (including submissions for
any similar projects from previous years) and serious penalties will be applied,
particularly in the case of repeat offences.
Do not copy from others. Do not allow anyone to see your code.
Please refer to the UNSW Policy on Academic Honesty and Plagiarism if you
require further clarification on this matter.
請加QQ：99515681 郵箱：99515681@qq.com WX：codinghelp

標簽：

掃一掃在手機打開當前頁

上一篇:FINS5510代寫、代做Python/c++程序語言

下一篇:代寫公式指標代寫指標股票公式定制開發

注：本網條致力提供真實有用信息，所轉載的內容，其版權均由原作者和資料提供方所擁有！若有任何不適煩請聯系我們，將會在24小時內刪除。

無相關信息

昆明生活資訊

·昆明市義務教育階段招生入學系統(昆明義招網)

·昆明市護照辦理網點地址電話

·昆明胡志明舊居對公眾開放

·昆明市常用對外公開電話

·云南招生考試院

·云南省2023年度高等學校名單（權威發布）

·大理旅游投訴

·楚雄州中醫醫院

·西雙版納旅游度假區

·昆明社保查詢

·昆明市2023年城鄉居民醫保繳費標準為350元/人

·昆明市住房和城鄉建設局各部門電話

·昆明最新招聘信息

·2022年云南中考時間

·云南各地2022年高考舉報電話公布（云南省2022

·昆明清明出行指南，10條公交專線

·昆明就業服務網

·昆明29個發熱門診名單及電話

·昆明市公共租賃住房便民服務點

·2021年度昆明靈活就業人員參加城鎮職工基本養

·昆明電子犬證辦理指南（附33種禁止飼養的烈性

·云南省中等職業學校招生錄取系統

·云南違規違法中介機構、開發企業、物業服務、

·昆明市職稱評審申報指南（2021版）

·昆明市旅游投訴電話

·昆明工傷認定流程條件

·昆明市人力資源和社會保障局聯系方式

·昆明失業保險查詢

·昆明市最新水價標準，家里人多可以這樣申請優

·昆明主城五區高考共設21個考點

·昆明戶籍業務、身份證、居住證咨詢電話

·云南省創業擔保貸款政策咨詢及申辦程序30問（

·2020昆明年小學招生劃片、入學指南（持續更新

·昆明市住房保障局

·昆明住房公積金網上業務大廳登錄

·昆明市學生資助管理中心

·云南省高校畢業生就業創業政策百問（2019）

·昆明高校畢業生就業預登記

·昆明《云南省居住證》辦理條件和所需材料

·昆明人事代理檔案查詢

·云南野生菌攻略

·云南節慶活動攻略

·昆明市流浪乞討人員救助管理求助電話

·昆明不動產登記收費公示

·昆明市（區）不動產信息檔案查詢窗口聯系電話

·昆明市、縣（市）區不動產登記中心辦公地點和

·昆明市公租房在線申請

·昆明申請公共租賃住房提交材料清單非當地戶籍

·昆明申請公共租賃住房提交材料清單主城八區城

·昆明申請公共租賃住房優先保障條件及提交材料

·昆明市各縣（市）區教育部門咨詢電話

·昆明市教育局職能處室咨詢電話

·預防校園欺凌

·2018學年度昆明各級各類學校收費通告

·昆明市學生資助補助政策有哪些？如何申請辦理

·昆明社會保障IC卡遺失后如何掛失，0871-63331

·昆明市醫療保險中心各部門電話

·昆明各縣區醫保分中心的電話聯系方式及地址.

昆明圖文信息

蝴蝶泉（4A）-大理旅游

油炸竹蟲

酸筍煮魚（雞）

竹筒飯

香茅草烤魚

檸檬烤魚

昆明西山國家級風景名勝區

昆明旅游索道攻略

推薦信息

相關文章

無相關信息

欄目更新

·CSCI 2600代做、代寫Java設計程序

·CSCI 2600代做、代寫Java設計程序

·代寫GA.2250、Python/Java程序語言代做

·代寫MTH5510、代做Matlab程序語言

·代寫COMP4337、代做Python編程設計

·代寫COMP528、代做c/c++，Python程序語言

·CIT 593代做、代寫Java/c++語言編程

·代做COMP9021、代寫Python編程設計

·DDES9903代寫、代做Python，c/c++編程

·代寫 661985 – Safety Critical System編程

NBA直播短信驗證碼平臺幣安官網下載歐冠直播 WPS下載

關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網 版權所有
ICP備06013414號-3 公安備 42010502001045

爱情鸟第一论坛com高清免费_91免费精品国自产拍在线可以看_亚洲一区精品中文字幕_男人操心女人的视频

<strike id="bfrlb"></strike><form id="bfrlb"><form id="bfrlb"><nobr id="bfrlb"></nobr></form></form>

<sub id="bfrlb"><listing id="bfrlb"><menuitem id="bfrlb"></menuitem></listing></sub>

<form id="bfrlb"></form>

<form id="bfrlb"></form>

<address id="bfrlb"></address>

<address id="bfrlb"></address>

>