講座報(bào)告主題:非均勻環(huán)境下的強(qiáng)化學(xué)習(xí)
專(zhuān)家姓名:史成春
日期:2023-09-18 時(shí)間:09:00
地點(diǎn):數(shù)科院206
主辦單位:數(shù)學(xué)科學(xué)學(xué)院
主講簡(jiǎn)介:Chengchun Shi is an Associate Professor at London School of Economics and Political Science. He is serving as the associate editors of JRSSB, JASA (T&M) and Journal of Nonparametric Statistics. His research focuses on developing statistical learning methods in reinforcement learning, with applications to healthcare, ridesharing, video-sharing and neuroimaging. He was the recipient of the Royal Statistical Society Research Prize in 2021. He also received the IMS travel awards in three years.研究專(zhuān)長(zhǎng):強(qiáng)化學(xué)習(xí),統(tǒng)計(jì)推斷。
主講內(nèi)容簡(jiǎn)介:本文考慮在可能的非平穩(wěn)環(huán)境中進(jìn)行離線(xiàn)強(qiáng)化學(xué)習(xí)(RL)方法。文獻(xiàn)中許多現(xiàn)有的RL算法依賴(lài)于平穩(wěn)性假設(shè),該假設(shè)要求系統(tǒng)轉(zhuǎn)換和獎(jiǎng)勵(lì)函數(shù)在時(shí)間上保持恒定。然而,實(shí)際情況下,平穩(wěn)性假設(shè)是有限制性的,并且在許多應(yīng)用中很可能被違反,包括交通信號(hào)控制、機(jī)器人技術(shù)和移動(dòng)健康等領(lǐng)域。在本文中,我們基于預(yù)先收集的歷史數(shù)據(jù),提出了一種一致的過(guò)程來(lái)測(cè)試最優(yōu)策略的非平穩(wěn)性,而無(wú)需額外的在線(xiàn)數(shù)據(jù)收集?;谔岢龅臏y(cè)試,我們進(jìn)一步開(kāi)發(fā)了一種順序變點(diǎn)檢測(cè)方法,可以與現(xiàn)有的最先進(jìn)RL方法自然地結(jié)合,用于在非平穩(wěn)環(huán)境中進(jìn)行策略?xún)?yōu)化。我們的方法的有用性通過(guò)理論結(jié)果、仿真研究和來(lái)自2018年實(shí)習(xí)生健康研究的真實(shí)數(shù)據(jù)示例進(jìn)行了說(shuō)明。提出的方法的Python實(shí)現(xiàn)可在https://github.com/limengbinggz/CUSUM-RL ↗ 上找到。
歡迎師生參加!