Risk-averse Learning with Non-Stationary Distributions
arxiv(2024)
摘要
Considering non-stationary environments in online optimization enables
decision-maker to effectively adapt to changes and improve its performance over
time. In such cases, it is favorable to adopt a strategy that minimizes the
negative impact of change to avoid potentially risky situations. In this paper,
we investigate risk-averse online optimization where the distribution of the
random cost changes over time. We minimize risk-averse objective function using
the Conditional Value at Risk (CVaR) as risk measure. Due to the difficulty in
obtaining the exact CVaR gradient, we employ a zeroth-order optimization
approach that queries the cost function values multiple times at each iteration
and estimates the CVaR gradient using the sampled values. To facilitate the
regret analysis, we use a variation metric based on Wasserstein distance to
capture time-varying distributions. Given that the distribution variation is
sub-linear in the total number of episodes, we show that our designed learning
algorithm achieves sub-linear dynamic regret with high probability for both
convex and strongly convex functions. Moreover, theoretical results suggest
that increasing the number of samples leads to a reduction in the dynamic
regret bounds until the sampling number reaches a specific limit. Finally, we
provide numerical experiments of dynamic pricing in a parking lot to illustrate
the efficacy of the designed algorithm.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要