Zeroth-Order Methods for Online Distributed Optimization with Strongly Pseudoconvex Cost Functions

Xiaoxi YAN, Muyuan MA, Kaihong LU

Journal of Systems Science and Information ›› 2024, Vol. 12 ›› Issue (1) : 145-160.

PDF(292 KB)
PDF(292 KB)
Journal of Systems Science and Information ›› 2024, Vol. 12 ›› Issue (1) : 145-160. DOI: 10.21078/JSSI-2023-0115
 

Zeroth-Order Methods for Online Distributed Optimization with Strongly Pseudoconvex Cost Functions

Author information +
History +

Abstract

This paper studies an online distributed optimization problem over multi-agent systems. In this problem, the goal of agents is to cooperatively minimize the sum of locally dynamic cost functions. Different from most existing works on distributed optimization, here we consider the case where the cost function is strongly pseudoconvex and real gradients of objective functions are not available. To handle this problem, an online zeroth-order stochastic optimization algorithm involving the single-point gradient estimator is proposed. Under the algorithm, each agent only has access to the information associated with its own cost function and the estimate of the gradient, and exchange local state information with its immediate neighbors via a time-varying digraph. The performance of the algorithm is measured by the expectation of dynamic regret. Under mild assumptions on graphs, we prove that if the cumulative deviation of minimizer sequence grows within a certain rate, then the expectation of dynamic regret grows sublinearly. Finally, a simulation example is given to illustrate the validity of our results.

Key words

multi-agent systems / strongly pseudoconvex function / single-point gradient estimator / online distributed optimization

Cite this article

Download Citations
Xiaoxi YAN , Muyuan MA , Kaihong LU. Zeroth-Order Methods for Online Distributed Optimization with Strongly Pseudoconvex Cost Functions. Journal of Systems Science and Information, 2024, 12(1): 145-160 https://doi.org/10.21078/JSSI-2023-0115

1 Introduction

In recent years, distributed optimization has attracted extensive attention in various fields[15]. This is due to its wide range of practicability in numerous areas such as distributed coordinated control[6], power grid economic dispatch[7] and internet of things applications[8].
In practical applications, various convex optimization problems have been solved recently. Incremental sub-gradient algorithm is one of the earlier methods to solve convex optimization problems[9]. Furthermore, combining consistency algorithm with sub-gradient algorithm, it is proved that all agents will reach the optimal consistency state in [10] and [11]. In [11] and [12], sub-gradient algorithms based on consistency are proposed for distributed convex optimization problems with or without bounded closed convexity. In [13], a continuous-time sub-gradient algorithm is proposed for solving distributed convex optimization problems with general constraints. Moreover, for the multi-agent rendezvous problem, where the dynamic change of each robot is continuous, a distributed sub-gradient shortest distance rendezvous algorithm is proposed in [14]. In [15], a distributed optimization algorithm based on gradient tracking is proposed for distributed convex optimization problems with set constraints. It is worth noting that all the above works rely on real gradient information.
However, it is not feasible or costly to calculate the gradient information accurately in practical applications. For example, in the internet of things[16], fog computing can not get the closed expression of delay since its online decision-making needs to adapt to the user preferences and the availability of resources is temporarily unpredictable. Moreover, in bandit optimization[17], the player can only observe the value of the function, not the specific target function. In order to solve the above problems, zeroth-order stochastic optimization has attracted researchers's attention recently. In [18] and [19], by employing the two-point gradient estimator, several different zeroth-order optimization algorithms have been proposed for unconstrained problem. In [20], an adaptive distributed bandit primal-dual algorithm based on two-point gradient estimator is proposed for the distributed online stochastic convex optimization with time-varying constraints. In [21], a distributed stochastic approximation method of Kieffer-Wolfowitz type is developed for zeroth-order optimization of stochastic networks. In [22], a distributed gradient descent algorithm is proposed for multi-agent optimization problems with set constraints, where the real gradient information is replaced by local construction of random gradients. In [23], the convex optimization problem with time-varying network is discussed in detail, and an algorithm of approximate substitution of sub-gradient by bilateral random gradient is proposed.
It is worth noting that all the aforementioned investigations are conducted for convex optimization problems. However, the problems of pseudoconvex optimization[24] exist widely in reality such as fractional programming[25], economics[26], solid mechanics[27]. Compared with the convex optimization, pseudoconvex optimization is more general, which also covers some nonconvex cases[25]. In fact, online distributed optimization problem with strongly pseudoconvex cost function has been studied in [28], where real gradient information of objective functions is required. However, the real gradient information is often difficult to be achieved in practical applications, or computing real gradients takes high costs. Motivated by [18, 19, 28], we try to develop zeroth-order methods to deal with online distributed optimization problem with strongly pseudoconvex cost function.
In this paper, the problem of online distributed optimization with strongly pseudoconvex cost functions is studied, where the real gradients of objective functions are not available. Different from [2932], where the cost functions are assumed to be convex, here cost functions are assumed to be strongly pseudoconvex, which is more general. Both the zeroth-order information and the pseudoconvexity of cost functions will decrease the convergence rate of the online algorithm and enlarge the bound of the dynamic regret. How to deal with this setting and achieve an effective regret bound is a great challenge. To solve the problem, an online zeroth-order stochastic optimization algorithm is proposed. In this algorithm, the auxiliary optimization-based strategy is adopted to update the state of agents, and a single-point gradient estimator is used to estimate the local gradient. When running this algorithm, less real information is used than the algorithm in [28], which makes our algorithm more practical than that in [28]. This is different from [18] and [19], where agent estimates the true gradient based on the two-point gradient estimator, here the single-point gradient estimator can be realized even the values of some random variables change rapidly. The performance of this algorithm is measured by the expectation of dynamic regret. We prove that if the graph is G-strongly connected, and the cumulative deviation of the minimizer sequence grows with a certain rate, then the expectation of dynamic regret grows sublinearly.
This paper is organized as follows. In Section 2, We formulate the problem and propose an online zeroth-order stochastic optimization algorithm based on single-point gradient estimator. In Section 3, We state the main results of this paper and give the proof. A simulation example is given in Section 4. The Section 5 is the conclusion of the full text.
Notations: The absolute value of the scalar μ is denoted by |μ|. The set of real numbers is denoted by R. N is used to represent the set of positive integers. For any TN, we denote set T={0,1,,T}. We use RM to denote M-dimensional real vector space. For given vectors u,vRM, and matrix PRM×M, we denote u,vP=Pu,v, μP2=μTPμ, μ=μTμ and μ1=j=1M|μi|, where μi represents the ith entry of vector μ. For function g():RMR, We use g(μ) to represent the gradient of g(μ), and use 2g(μ) to denote its Hessian matrix. In is a n×n identity matrix. 1MRM is an M-dimensional vector whose elements are all ones. For a matrix N, [N]ij denotes the matrix entry in the ith row and jth column, [N]i represents the ith row of the matrix N. λmax(N) and λmin(N) represent the maximum and minimum eigenvalues of N, and denote N=λmax(NTN).

2 Problem Formulation

2.1 Basic Graph Theory

A time-varying directed communication graph is set as G(t)=(V,E(t),X(t)), where V={1,,n} is a set of vertices, E(t)V×V is an edge set. X(t)=(xij(t))n×n is a non-negative matrix to represent the weight of adjacent edges, where xij(t)>l for some l>0 if (j,i)E(t) and xij(t)=0 otherwise. We denote Ni(t)={jV(j,i)E(t)} as the neighbor set of agent i at time t, where iNi(t) for any iV. For a fixed topology G=(V,E,X), r represents the length of path between node i1 and node ir+1, where the path is a sequence of r+1 distinct nodes i1ir+1 and (iu,iu+1)V for u=1,,r. {G(t)} is strongly connected if there is a path between any pair of distinct nodes. For {G(t)}, an G-edge set is defined as EG(t)=k=tG(t+1)G1E(k) for some constant G>0. Under the above conditions, if {G(t)} with V and EG(t) is strongly connected for any t0, then {G(t)} is called G-strongly connected.
The following assumption is made for the graph.
Assumption 1 For any t0, G(t) is a G-strongly connected graph and X(t) is a doubly stochastic matrix.
The connectivity of graph in Assumption 1 plays an important role in facilitating agents to achieve a common state[12, 33]. The following lemma in [12] is recalled.
Lemma 1[12] Under Assumption 1, for any i,jV, there exists certain B>0 and 0<λ<1 satisfying
|Ψ(t,m)ij1n|Bλtm,
(1)
where Ψ(t,m) is the state transition matrix of the consensus model ϱ(t+1)=Xϱ(t) with ϱ(0)Rn for any tm0.

2.2 Online Distributed Pseudoconvex Optimization

Consider a multi-agent system consisting of n agents, labeled by set V={1,,n}. Agents communicate with each other through the digraph {G(t)}. For agent iV, a sequence of cost functions is given by {gi1,,giT}, where git:ΛR is twice differentiable for any tT, TN is unknown to the agents, and μRM. At each iteration time tT, agent i selects a state μi(t)Λ. Then, agent i receives a local cost function git, which means that the cost function information is not available until the agent updates actions. In this case, at each iteration time t, each agent attempts to cooperatively solve the following optimization problems
mingt(μ)=i=1ngit(μ), subject to   μΛ.
(2)
For any μΛ and some random vector ζiSiRM, denote Fi(μ;ζi) : Λ×SiR as the local cost function of agent i. We use ζi to describe some nonadditive random processes, and the expected local cost function is introduced here gi(μ)=Eζ[Fi(μ;ζi)]. (2) is rewritten as follows
minμΛgt(μ)=i=1nEζ[Fi(μ;ζi)].
(3)
Here some basic assumptions are made for the problem.
Assumption 2 Λ is non-empty, bounded and compact, i.e., for any p,qΛ, there holds pqκ, where κ is a positive constant. Moreover, Λ is a closed convex set.
Assumption 3 For any tT, each gt is o-strongly pseudoconvex on Λ i.e., for any p,qΛ, gt(p),qp0 implies gt(q),pqoqp2 with some constant o>0.
Remark 1 In Assumption 3, the gradient of the cost function satisfies strong pseudomonotonicity. Note that, if gt is strongly pseudomonotone, then gt is strongly pseudoconvex[34]. The definitions of pseudoconvex, quasiconvex and convex optimization and their relationships are discussed as follows. For a given differentiable function g():RmR, g() is pseudoconvex on ΛRm if for any two different points u,vΛ, g(u),vu0 implies g(v)g(u)0. Moreover, if for any two different points u,vΛ, g(u),vu0 implies g(v)g(u)o/2vu2 with some constant o>0, then g() is called a o-strongly pseudoconvex function on Λ. And g() is convex on Λ if for any two different points u,vΛ, implies g(v)g(u)g(u),vu. Moreover, g() is quasiconvex on Λ if for any two different points u,vΛ and 0v1, implies g(vv+(1v)u)max{g(u),g(v)}. According to the above definitions, it is not difficult to verify that quasiconvex function includes convex function and pseudoconvex function, and convex function is a special case of pseudoconvex function. Compared with convex optimization[3032], strongly pseudoconvex optimization problems is more general. It is undeniable that quasiconvex is more complex than pseudoconvex, and it is a more general nonconvex problem. In our future works, we will consider the case with quasiconvex.
Assumption 4 For any iV, |Fi(μ;ζi)|C1+ on Λ.
Any online algorithm should mimic the performance of its offline counterpart, and the gap between them is called regret. If offline benchmark for agents is to minimize t=0Tgt(μ), then the regret is called static regret[1], which is defined as
Ris(T)=t=0Tgit(μi(t))t=0Tgit(μ),iV,
(4)
where μ=argminμΛt=0Tgt(μ). Here the offline benchmark for agents is to minimize gt(μ) at each time, and such regret is called dynamic regret[34], which is defined as
Rid(T)=t=0Tgit(μi(t))t=0Tgit(μ(t)),iV,
(5)
where μ(t)=argminμΛgt(μ) for any tT. The offline benchmark of dynamic regret (5) is more stringent than that of static regret. It is undeniable that dynamic regret may fail in the worst case. Inspired by [3032], we use the following deviation of the minimizer sequence μ(t)t=0T to describe the difficulty:
ΓT=t=0Tμ(t+1)μ(t).
(6)

2.3 Online Distributed Algorithms

Consider the following centralized optimization problem
ming(μ), subject  to  μΛ,
(7)
where Λ is a closed convex set, g:RmR is a strongly pseudoconvex function, and the real gradients of objective function are not available. To address this problem, motivated by [35], We begin to construct the gradient estimator.
Consider a gradient estimator which is based on a single-point evaluation of local cost function. At each time t, agent i estimates the gradient of its local cost function with
h^i(t)=vi,tFi(μi(t)+δ(t)vi,t;ζi,t)δ(t),
(8)
where δ(t)>0 represents the step-size and vi,t[V,V]MRM is a random perturbation vector which is independently generated by each agent.
To solve (2), an online zeroth-order stochastic optimization algorithm is proposed as follows
{μi(t+1)=argminμΛ{μTPμ+θ(t)h^i(t)2Pzi(t),μ},zi(t)=jNi(t)xijμj(t),h^i(t)=vi,tFi(μi(t)+δ(t)vi,t;ζi,t)δ(t),
(9)
for any iV, where μi(t) is the state of agent i used to estimate the minimizer of the objective function at time tT, μi(0)=μi0Λ. P is a symmetric and positive definite matrix. Each agent updates its state by the average-consensus algorithm θ(t) and δ(t) are positive and decaying learning rates, and initial value θ(0)=θ0>0.
Motivated by the single-point gradient estimate strategy[35], consensus algorithm[36], and the auxiliary optimization algorithm[33], we propose the algorithm (9). The consensus term zi(t) is inspired by the consensus algorithm in [10] and [36]. Under this algorithm, each agent updates actions according to its own state, the state information received from its neighbors at the current moment and the information of gradient estimation. This determines algorithm (9) is distributed and online.
Remark 2 By implementing the algorithm, matrix P should be known in prior by all agents, which may prevent the proposed algorithm from being fully distributed. In fact, in a balanced and periodically strongly connected (G-strongly connected) communication graph, it is not difficult to determine a common constant P for each agent by only using local information. For example, the local initial state of each agent is set to be a symmetric and positive definite matrix Pi, iV. Each agent updates its state by the average-consensus algorithm
Pi(t+1)=j=1nxij(t)Pj(t),
then all agents' states converge to 1ni=1nPi, which helps each one achieve a common symmetric and positive definite matrix P.

3 Main Results

Theorem 1 Under Assumptions 1, 2, 3 and 4, if the learning rates are selected to be θ(t)=α(t+1)23, δ(t)=β(t+1)16, where α,β>0, then for any iV and learning time TN, we have
E[Rid(T)]nδ1Q+16KLΓToαln2((T+1)5/6ln(T+1)),
(10)
where Q=C2+3C3α2/β2+24C5λα/(oβ2)+12κM32V2σ1/(oα)λ(1λ)ln2+4doα, ξ=BMi=1nμi(0)1, S1=ξ2+nξBC1MVθ0σ(1λ)δ0, S2(nBMVC1)24σ2(1λ), C2=4C4/(oα)+2S1, C3=20nLS2/(oα)+2S2, C4=5nLS1+n(κσ1+δ1)α, C5=5nLS2, d=Lκ, K=supμΛμ, δ1=suptT,iV,μΛgit(μi(t)), L=nλmax(P), σ=λmin(P), σ1=suptT,iV,μΛ2git(μi(t)).
Remark 3 By Theorem 1, we can see that ΓT plays an important role in the sublinear boundary of dynamic regret. It can be noted that limT(T+1)5/6ln(T+1)T=0. If ΓT sublinearly grows with(T+1)1/3ln(T+1), then limTΓT(T+1)5/6ln(T+1)T=0, which implies online distributed algorithm (9) solves problem (1) well. Therefore, the algorithm (9) is suitable for solving the strongly pseudoconvex optimization problem, where the real gradients of objective function are not available. If the amplitude of minimizer sequence {μ(t)}t=0T change is drastic, ΓT might become linear with (T+1)1/3ln(T+1), then the performance of algorithm (9) cannot be guaranteed. This is natural since the problem is also unsolvable in the worst case, even in online convex optimization[3032]. In convergence analysis, there is an error between the estimate gradient and the real gradient, which enlarges the bound of the dynamic regret. In order to overcome this difficulty, we construct a new smooth function to establish the relationship between the estimate gradient and the real gradient. By selecting an appropriate step-size of the smoothing function and an appropriate learning rate of the online algorithm, we prove that the expectation of the regret function increases sublinearly. In addition, the convergence rate of this algorithm is slower than that of [28]. This is normal since the algorithm uses less information and does not directly use the real gradient information.
Before giving the proof of Theorem 1, some useful lemmas need to be provided. First, the Karush-Kuhn-Tucker (KKT) condition for pseudoconvex optimization is recalled.
Lemma 2[1] μ is a minimum point of g(μ) on Υ if and only if g(μ),μμ0, μΥ.
Lemma 3[35] Suppose that for any iV and tT, the random perturbation vector vit∥≤MV, and h^i(t)∥≤C1MVδ(t), there holds gi(μi(t))=E[h^i(t)|μi(t)]bi(t), where bi(t) represents a estimation bias of the gradient estimator with the property bi(t)∥≤12M32V2σ1δ(t).
Now we present the following lemma, which gives the upper bound of the error of each agent's state and the average value in each iteration time under algorithm (9).
Lemma 4 Under Assumptions 1, 2, 3 and 4, for any tT,
μi(t)μ¯(t)ξλt+nBMVC1στ=0tλtτθ(τ)δ(τ)
(11)
and
μi(t)μ¯(t)2S1λt+S2τ=0tλtτ(θ(τ)δ(τ))2,
(12)
where μ¯(t)=1ni=1nμi(t) and iV.
Proof. Using KKT condition to measure the the first equation of algorithm (9), there holds
μi(t+1)zi(t),μi(t+1)μPθ(t)2h^i(t),μμi(t+1),
for any μΛ. By the convexity of Λ, we have zi(t)Λ for any μi(t)Λ and iV. Let μ=zi(t), together with the facts that 2σμi(t+1)zi(t)2≤∥μi(t+1)zi(t)P2 and h^i(t)∥≤C1MVδ(t), we have
2σμi(t+1)zi(t)2C1MVθ(t)2δ(t)μi(t+1)zi(t).
(13)
Let us denote qi(t)=μi(t+1)zi(t), from (13), there holds qi(t)C1MVθ(t)2σδ(t). Moreover,
μi(t+1)=jNi(t)xijμj(t)+qi(t).
Let μ~γ(t)Rn and q~γ(t)Rn to represent the stack of the γth term of μi(t) and the stack of the γth term of qγ(t), respectively, where iV. Their relationship is as follows
μ~γ(t+1)=X(t)μ~γ(t)+q~γ(t).
which implies that
μ~γ(t)=Ψ(t,0)μ~γ(0)+τ=1tΨ(t,τ)q~γ(τ).
(14)
From (1) we know that Ψ(t,m) is a doubly stochastic matrix for any tm0, then
1Tμ~γ(t)=1Tμ~γ(0)+τ=1t1Tq~γ(τ1).
(15)
From (14) and (15), there holds
|[μ~γ(t)]i1n1Tμ~γ(t)||([Ψ(t,0)]i1n1T)μ~γ(0)|+τ=1t1T|([Ψ(t,τ)]i1n1T)q~γ(τ1)|max1jn|[Ψ(t,0)]ij1n|μ~γ(0)1+nC1MV2στ=1tθ(τ1)δ(τ1)max1jn|[Ψ(t,τ)]ij1n|,
for any iV. By (1), there holds
|[μ~γ(t)]i1n1Tμ~γ(t)|Bλtμ~γ(0)1+nC1MVB2στ=0tλtτθ(τ)δ(τ).
This result proves (11). Moreover, due to the facts that θ(t)δ(t) is non-increasing and 0<λ<1, we have
μi(t)μ¯(t)2(ξ2+ξHnC1MVθ0σ(1λ)δ0)λt+(nMHC1V)24σ2(τ=0tλtτθ(τ)δ(τ))2.
(16)
Using Cauchy-Schwarz inequality yields
(τ=0tλtτθ(τ)δ(τ))2(τ=0tλtτ)(τ=0tλtτ(θ(τ)δ(τ))2)11λτ=0tλtτ(θ(τ)δ(τ))2.
(17)
Substituting (17) into (16) proves the validity of (12).
Then we give the cumulative upper bound of the error between the average state of the agent and the minimum point.
Lemma 5 Under Assumptions 1, 2 and 3, if θ(t) is non-increasing, there holds
t=0Tμ¯(t)μ(t)22C4oθ(T)t=0Tλt+10nLS2oθ(T)t=0Tτ=0t+1λtτ(θ(t)δ(t))2+8KLΓToθ(T)+2t=0Tωoθ(T)+2doθ(T)+2C5oθ(T)t=0Tτ=0t+1λtτ(θ(t)2)δ(t).
(18)
To prove Lemma 5, we give an auxiliary lemma as follows.
Lemma 6 Under Assumptions 1, 2 and 3, for any yRM and tT,
i=1nμi(t)zi(t),yμi(t+1)P5nL2S1i=1nλt+5nL2S2τ=0t+1λtτ(θ(τ)δ(τ))2.
(19)
Proof. Note that
i=1nμi(t)zi(t),yμi(t+1)P=i=1nμi(t)zi(t),yμ¯(t+1)P+i=1nμi(t)zi(t),μ¯(t+1)μi(t+1)Pi=1n(μi(t)zi(t)),yμ¯(t+1)P+Li=1nμi(t)zi(t)μ¯(t+1)μi(t+1).
Then, note that i=1n(μi(t)zi(t)),yμ¯(t+1)P=0. By using Young's inequality, there holds
i=1nμi(t)zi(t),yμi(t+1)P2Li=1nμi(t)μ¯(t)2+L2i=1nμ¯(t+1)μi(t+1)2.
(20)
Using (12) and substituting 1>λ>0 into (20) it yields (19).
Proof. (Proof of Lemma 5) Note that,
12μ(t+1)μi(t+1)p212μ(t)μi(t)p2=12μi(t)μi(t+1)p2+μi(t)μi(t+1),μ(t)μi(t+1)p+i=1n12(μ(t+1)+μ(t))μi(t+1),μ(t+1)μ(t)p.
(21)
for any iV. Now we denote D(t)=12i=1nμ(t)μi(t)p2, there holds
D(t)=D(t+1)D(t)=12i=1nμi(t)μi(t+1)p2+i=1nμi(t)μi(t+1),μ(t)μi(t+1)p+i=1n12(μ(t+1)+μ(t))μi(t+1),μ(t+1)μ(t)pi=1nμi(t)μi(t+1),μ(t)μi(t+1)pi=1nσ2μi(t)μi(t+1)2+2kLμ(t+1)μ(t).
(22)
Then, by KKT condition, there holds
zi(t)μi(t+1),μ(t)μi(t+1)pθ(t)2h^i(t),μ(t)μi(t+1),
for any iV. By (19) we have
i=1nμi(t)μi(t+1),μ(t)μi(t+1)p=i=1nzi(t)μi(t+1),μ(t)μi(t+1)p+i=1nμi(t)zi(t),μ(t)μi(t+1)pθ(t)2h^i(t),μ(t)μi(t+1)+5nL2S1i=1nλt+5nL2S2τ=0t+1λtτ(θ(τ)δ(τ))2.
(23)
Under Assumption 3, by Lemma 2, we have
i=1n\nablagit(μ¯(t)),μ¯(t)μ(t)o2μ¯(t)μ(t)2.
Note that 2gitσ1 for any μΛ, we have git(μi(t))git(μ¯(t))σ1μi(t)μ¯(t) for any iV. Since Λ in Assumption 2 is bounded, there holds
i=1ngit(μi(t)),μ(t)μi(t)=i=1ngit(μi(t))git(μ¯(t)),μ(t)μi(t)   +i=1ngit(μ¯(t)),μ¯(t)μi(t)i=1ngit(μ¯(t)),μ¯(t)μ(t)i=1n(κσ1+δ1)μi(t)μ¯(t)o2μ¯(t)μ(t)2.
Then,
θ(t)i=1nh^i(t),μ(t)μi(t+1)=θ(t)i=1nh^i(t),μ(t)μi(t)+θ(t)i=1nh^i(t),μi(t)μi(t+1)=θ(t)i=1nh^i(t)git(μi(t)),μ(t)μi(t)+θ(t)i=1ngit(μi(t)),μ(t)μi(t)+θ(t)i=1nh^i(t),μi(t)μi(t+1).
(24)
By using Young's inequality, we have
θ(t)i=1nh^i(t),μi(t)μi(t+1)i=1nσ2μi(t)μi(t+1)2+nMV2C12θ(t)22σδ(t)2.
(25)
we denote ω=θ(t)i=1nh^i(t)git(μi(t)),μ(t)μi(t), by (22)(25) and using (9), there holds
D(t)12ωoθ(t)4μ¯(t)μ(t)2+nMV2C124σ(θ(t)δ(t))2+2KLμ(t)μ(t+1)+C42λt+C52τ=0t+1λtτ(θ(t))2δ(t)+5nLS22τ=0t+1λtτ(θ(t)δ(t))2.
(26)
Due to the fact D(t)0 for any tT, there holds t=0TD(t)=D(0)D(T)D(0)d/2. By accumulating the two sides of (26) from t=0 to T, we can achieve (18) in Lemma 5.
Proof. (Proof of Theorem 1) By Lemma 4 and Lemma 5, for any iV, there holds
t=0Tμi(t)μ(t)22t=0Tμ¯(t)μ(t)2+2t=0Tμi(t)μ¯(t)2(4C4oθ(T)+2S1)t=0Tλt+16KLΓToθ(T)+4t=0Tωoθ(T)+(20nLS2oθ(T)+2S2)t=0Tτ=0t+1λtτ(θ(t)δ(t))2+4C5oθ(T)t=0Tτ=0t+1λtτ(θ(t)2)δ(t)+4doθ(T).
(27)
Let θ(t)=α(t+1)23 and δ(t)=β(t+1)16. Note that t=0T11+t=t=1T+11tt=1T+11tdt=1+ln(T+1). Similarly, t=0T1(1+t)766(T+1)16+6, t=0T1(1+t)56≤=6(T+1)166 and t=0Tλt11λ, we have
t=0Tτ=0t+1λtτ(θ(t)δ(t))2=τ=1T+1t=τ1Tλtτ(θ(t)δ(t))2+t=0Tλt(θ0δ0)2(τ=1T+1(θ(t)δ(t))2)(t=0Tλt)+t=0Tλt(θ0δ0)23α2ln(T+1)β2λ(1λ)ln2,
(28)
where the first equation is established by changing the order of summation. Note that for any T1 yields ln(T+1)ln2, which leads to the result of the last inequality. Similarly, we have t=0Tτ=0t+1λtτ(θ(t))2δ(t)6α2λβ2λ(1λ).
From Lemma 2, taking the total expectation for ω yields
t=0TE[ω]12κM32V2σ1t=0Tθ(t)δ(t)12κM32V2σ1(6(T+1)166)3κM32V2σ1(T+1)16.
(29)
Taking expectation on both sides of the inequality (27) results in that
t=0TEμi(t)μ(t)2(4C4oθ(T)+2S1)t=0Tλt+16KLΓToθ(T)+12κM32V2σ1(T+1)16oθ(T)+(20nLS2oθ(T)+2S2)t=0Tτ=0t+1λtτ(θ(t)δ(t))2+4C5oθ(T)t=0Tτ=0t+1λtτ(θ(t)2)δ(t)+4doθ(T).
(30)
Then, using Jensen's inequality, we have
(t=0Tμi(t)μ(t))2(T+1)t=0Tμi(t)μ(t)2.
(31)
we can take expectation on both sides of (31) to get
E(t=0Tμi(t)μ(t))2(T+1)t=0TEμi(t)μ(t)2.
(32)
Due to the fact (t=0TEμi(t)μ(t))2E(t=0Tμi(t)μ(t))2, and by (32), we have
t=0TEμi(t)μ(t)(T+1)t=0TEμi(t)μ(t)2.
(33)
By inequalities (27)(33), we have
t=0TEμi(t)μ(t)Q+16KLΓToαln2((T+1)5/6ln(T+1)).
(34)
Note that git(μi(t))δ1 and due to the Lipschitz continuity of gt, there holds
Rid(T)=t=0T(gt(μi(t))gt(μ(t)))t=0Tj=1ngjt(μi(t))gjt(μ(t))nδ1t=0Tμi(t)μ(t).
(35)
for any iV. Taking the expectation of Rid(T) and substituting (34) into (35), it immediately implies (10).

4 A Simulation Example

In this section, we illustrate the performance of the algorithm through a simulation example. A multi-agent system is set up with six agents, labeled as V={1,,6}, where each agent transmits information to its neighbors through a directed graph. As shown in Figure 1, there are a, b, c, d four possible digraph. The switching sequence of four graphs is abcda. The weight of each edge is assumed to be xij=1|Ni(t)|, where |Ni(t)| is the number of agent i's neighbors. Obviously, the union of these graphs is strongly connected. Here we set the learning time to be T=60. At each time tT, the objective function is given as follows
git(μ)=aiμ3+fi(t)μ,
Figure 1 The communication graph

Full size|PPT slide

where μR and iV. In this simulation, parameters are selected as a1=0.5, a2=0.8, a3=a6=1, a4=0.6, a5=0.1, fi(t) is randomly selected from [i,i] and subject to a uniform distribution. Moreover, the set constraint is given by Λ={μ|10μ1}. For any t0, it is not difficult to verify that gt is strongly pseudoconvex on Λ. Now suppose that agent i can only access its local objective function git and set Λ. Algorithm (14) is applied to the problem, and μi(t)R is used to represent agent i's state. Initial states of agents are given by: μ1(0)=0.3, μ2(0)=0.5, μ3(0)=0.4, μ4(0)=0.5, μ5(0)=0.1 and μ6(0)=0.2. By selecting ηt=1/200t+500, the trajectories of agents' states are shown in Figure 2. Figure 3 displays the average regrets of all agents respectively and shows that each average regret decays to zero after a period of time. Thus, the observations are consistent with results established in Theorem 1.
Figure 2 The trajectories of agents' states under algorithm (9)

Full size|PPT slide

Figure 3 The trajectory of Ri(t)/t, i=1,,6

Full size|PPT slide

5 Conclusions

This paper studies the problem of online distributed optimization problem with strongly pseudoconvex function. To solve this problem, we propose an online distributed zeroth-order stochastic optimization algorithm based on the single-point gradient estimator. Under this algorithm, each agent updates actions based on its own state, the state information received from its neighbors at the current moment and the information of gradient estimation. The performance of the algorithm is measured by the expectation of the dynamic regret, if the time-varying graph is G-strongly connected. The result shows that the bound of dynamic regret in expectation grows sublinearly, if the increment rate of minimizer sequence deviation is within a certain range. The simulation experiment in the previous section has verified its effectiveness. By considering practical application, our future work will focus on how to solve the more general pseudoconvex problem.

References

1
Akbari M, Gharesifard B, Linder T. Distributed online convex optimization on time-varying directed graphs. IEEE Transactions on Control of Network Systems, 2015, 4(3): 417- 428.
2
Li X, Xie L, Hong Y. Distributed continuous time algorithm for a general nonsmooth monotropic optimization problem. International Journal of Robust and Nonlinear Control, 2019, 29(10): 3252- 3266.
3
Deng Z, Chen T. Distributed algorithm design for constrained resource allocation problems with high-order multi-agent systems. Automatica, 2022, 144, 110492.
4
Lu K, Zhu Q, Yan X. Distributed ergodic algorithms for mixed equilibrium problems: Absent of cut property. Automatica, 2022, 141, 110297.
5
Lei J, Chen H. Distributed stochastic approximation algorithm with expanding truncations. IEEE Transactions on Automatic Control, 2019, 65(2): 664- 679.
6
Lu K, Zhu Q. Distributed algorithms involving fixed step size for mixed equilibrium problems with multiple set constraints. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(11): 5254- 5260.
7
Yang T, Lu J, Wu D, et al. A distributed algorithm for economic dispatch over time-varying directed networks with delays. IEEE Transactions on Industrial Electronics, 2016, 64(6): 5095- 5106.
8
Zhang J, You K, Cai K. Distributed dual gradient tracking for resource allocation in unbalanced networks. IEEE Transactions on Signal Processing, 2020, 68, 2186- 2198.
9
Ram S S, Nedić A, Veeravalli V V. Incremental stochastic subgradient algorithms for convex optimization. SIAM Journal on Optimization, 2009, 20(2): 691- 717.
10
Yuan D, Xu S, Zhao H. Distributed primal-dual subgradient method for multiagent optimization via consensus algorithms. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2011, 41(6): 1715- 1724.
11
Nedi\'{c} A, Ozdaglar A, Parrilo P A. Constrained consensus and optimization in multi-agent networks. IEEE Transactions on Automatic Control, 2010, 55(4): 922- 938.
12
Lu K, Wang L. Online distributed optimization with nonconvex objective functions via dynamic regrets. IEEE Transactions on Automatic Control, 2023.
13
Zhu Y, Yu W, Wen G, et al. Continuous-time distributed subgradient algorithm for convex optimization with general constraints. IEEE Transactions on Automatic Control, 2018, 64(4): 1694- 1701.
14
Wang H. Experimental implementation of multi-agent distributed optimization algorithms with shortest distance to convex regions. Riverside University of California, Riverside, 2016.
15
Liu H, Yu W, Chen G. Discrete-time algorithms for distributed constrained convex optimization with linear convergence rates. IEEE Transactions on Cybernetics, 2020, 52(6): 4874- 4885.
16
Chen T, Giannakis G B. Bandit convex optimization for scalable and dynamic IoT management. IEEE Internet of Things Journal, 2018, 6(1): 1276- 1286.
17
Agarwal A, Dekel O, Xiao L. Optimal algorithms for online convex optimization with multi-point bandit feedback. Colt-the Conference on Learning Theory, DBLP, 2010, 28- 40.
18
Duchi J C, Jordan M I, Wainwright M J, et al. Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Transactions on Information Theory, 2015, 61(5): 2788- 2806.
19
Nesterov Y, Spokoiny V. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 2017, 17, 527- 566.
20
Wang C, Xu S, Yuan D. Distributed online stochastic-constrained convex optimization with bandit feedback. IEEE Transactions on Cybernetics, 2024, 54(1): 63- 75.
21
Sahu A K, Jakovetic D, Bajovic D, et al. Distributed zeroth order optimization over random networks: A Kiefer-Wolfowitz stochastic approximation approach, IEEE Conference on Decision and Control (CDC). IEEE, 2018, 4951- 4958.
22
Pang Y, Hu G. Randomized gradient-free distributed optimization methods for a multiagent system with unknown cost function. IEEE Transactions on Automatic Control, 2019, 65(1): 333- 340.
23
Chen X M, Gao C. Strong consistency of random gradient-free algorithms for distributed optimization. Optimal Control Applications and Methods, 2017, 38(2): 247- 265.
24
Hu X, Wang J. A recurrent neural network for solving a class of general variational inequalities. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2007, 37(3): 528- 539.
25
Konnov I V, Luc D T, Rubinov A M, et al. Some classes of pseudoconvex fractional functions via the Charnes-Cooper transformation. Springer Berlin Heidelberg, 2006.
26
Forgo F, Joó I. Fixed point and equilibrium theorems in pseudoconvex and related spaces. Journal of Global Optimization, 1999, 14, 27- 54.
27
Barber J R, Klarbring A. Solid mechanics and its applications. Springer, Berlin, 2003.
28
Lu K, Jing G, Wang L. Online distributed optimization with strongly pseudoconvex-sum cost functions. IEEE Transactions on Automatic Control, 2019, 65(1): 426- 433.
29
Yuan D, Hong Y, Ho D W C, et al. Optimal distributed stochastic mirror descent for strongly convex optimization. Automatica, 2018, 90, 196- 203.
30
Hall E C, Willett R M. Online convex optimization in dynamic environments. IEEE Journal of Selected Topics in Signal Processing, 2015, 9(4): 647- 662.
31
Besbes O, Gur Y, Zeevi A. Non-stationary stochastic optimization. Operations Research, 2015, 63(5): 1227- 1244.
32
Jadbabaie A, Rakhlin A, Shahrampour S, et al. Online optimization: Competing with dynamic comparators. Artificial Intelligence and Statistics, PMLR, 2015, 398- 406.
33
Blondel V D, Hendrickx J M, Olshevsky A, et al. Convergence in multiagent coordination, consensus, and flocking. Proceedings of the 44th IEEE Conference on Decision and Control, IEEE, 2005, 2996- 3000.
34
Lu K, Xu H. Online distributed optimization with strongly pseudoconvex-sum cost functions and coupled inequality constraints. Automatica, 2023, 156, 111203.
35
Li W, Assaad M. Distributed zeroth-order stochastic optimization in time-varying networks. arXiv preprint arXiv: 2105.12597, 2021.
36
Lu K, Zhu Q, Yan X. Distributed ergodic algorithms for mixed equilibrium problems: Absent of cut property. Automatica, 2022, 141, 110297.

Funding

National Natural Science Foundation of China(62103169)
National Natural Science Foundation of China(51875380)
China Postdoctoral Science Foundation(2021M691313)
PDF(292 KB)

399

Accesses

0

Citation

Detail

Sections
Recommended

/