Zeroth-Order Methods for Online Distributed Optimization with Strongly Pseudoconvex Cost Functions

Xiaoxi YAN; Muyuan MA; Kaihong LU

doi:10.21078/JSSI-2023-0115

PDF(292 KB)

Journal of Systems Science and Information ›› 2024, Vol. 12 ›› Issue (1) : 145-160. DOI: 10.21078/JSSI-2023-0115

Zeroth-Order Methods for Online Distributed Optimization with Strongly Pseudoconvex Cost Functions

Author information +

History +

Abstract

This paper studies an online distributed optimization problem over multi-agent systems. In this problem, the goal of agents is to cooperatively minimize the sum of locally dynamic cost functions. Different from most existing works on distributed optimization, here we consider the case where the cost function is strongly pseudoconvex and real gradients of objective functions are not available. To handle this problem, an online zeroth-order stochastic optimization algorithm involving the single-point gradient estimator is proposed. Under the algorithm, each agent only has access to the information associated with its own cost function and the estimate of the gradient, and exchange local state information with its immediate neighbors via a time-varying digraph. The performance of the algorithm is measured by the expectation of dynamic regret. Under mild assumptions on graphs, we prove that if the cumulative deviation of minimizer sequence grows within a certain rate, then the expectation of dynamic regret grows sublinearly. Finally, a simulation example is given to illustrate the validity of our results.

Key words

multi-agent systems / strongly pseudoconvex function / single-point gradient estimator / online distributed optimization

Cite this article

EndNote

Ris (Procite)

Bibtex

Download Citations

Xiaoxi YAN , Muyuan MA , Kaihong LU. Zeroth-Order Methods for Online Distributed Optimization with Strongly Pseudoconvex Cost Functions. Journal of Systems Science and Information, 2024, 12(1): 145-160 https://doi.org/10.21078/JSSI-2023-0115

1 Introduction

In recent years, distributed optimization has attracted extensive attention in various fields^[1–5]. This is due to its wide range of practicability in numerous areas such as distributed coordinated control^[6], power grid economic dispatch^[7] and internet of things applications^[8].

In practical applications, various convex optimization problems have been solved recently. Incremental sub-gradient algorithm is one of the earlier methods to solve convex optimization problems^[9]. Furthermore, combining consistency algorithm with sub-gradient algorithm, it is proved that all agents will reach the optimal consistency state in [10] and [11]. In [11] and [12], sub-gradient algorithms based on consistency are proposed for distributed convex optimization problems with or without bounded closed convexity. In [13], a continuous-time sub-gradient algorithm is proposed for solving distributed convex optimization problems with general constraints. Moreover, for the multi-agent rendezvous problem, where the dynamic change of each robot is continuous, a distributed sub-gradient shortest distance rendezvous algorithm is proposed in [14]. In [15], a distributed optimization algorithm based on gradient tracking is proposed for distributed convex optimization problems with set constraints. It is worth noting that all the above works rely on real gradient information.

However, it is not feasible or costly to calculate the gradient information accurately in practical applications. For example, in the internet of things^[16], fog computing can not get the closed expression of delay since its online decision-making needs to adapt to the user preferences and the availability of resources is temporarily unpredictable. Moreover, in bandit optimization^[17], the player can only observe the value of the function, not the specific target function. In order to solve the above problems, zeroth-order stochastic optimization has attracted researchers's attention recently. In [18] and [19], by employing the two-point gradient estimator, several different zeroth-order optimization algorithms have been proposed for unconstrained problem. In [20], an adaptive distributed bandit primal-dual algorithm based on two-point gradient estimator is proposed for the distributed online stochastic convex optimization with time-varying constraints. In [21], a distributed stochastic approximation method of Kieffer-Wolfowitz type is developed for zeroth-order optimization of stochastic networks. In [22], a distributed gradient descent algorithm is proposed for multi-agent optimization problems with set constraints, where the real gradient information is replaced by local construction of random gradients. In [23], the convex optimization problem with time-varying network is discussed in detail, and an algorithm of approximate substitution of sub-gradient by bilateral random gradient is proposed.

It is worth noting that all the aforementioned investigations are conducted for convex optimization problems. However, the problems of pseudoconvex optimization^[24] exist widely in reality such as fractional programming^[25], economics^[26], solid mechanics^[27]. Compared with the convex optimization, pseudoconvex optimization is more general, which also covers some nonconvex cases^[25]. In fact, online distributed optimization problem with strongly pseudoconvex cost function has been studied in ^[28], where real gradient information of objective functions is required. However, the real gradient information is often difficult to be achieved in practical applications, or computing real gradients takes high costs. Motivated by [18, 19, 28], we try to develop zeroth-order methods to deal with online distributed optimization problem with strongly pseudoconvex cost function.

In this paper, the problem of online distributed optimization with strongly pseudoconvex cost functions is studied, where the real gradients of objective functions are not available. Different from [29–32], where the cost functions are assumed to be convex, here cost functions are assumed to be strongly pseudoconvex, which is more general. Both the zeroth-order information and the pseudoconvexity of cost functions will decrease the convergence rate of the online algorithm and enlarge the bound of the dynamic regret. How to deal with this setting and achieve an effective regret bound is a great challenge. To solve the problem, an online zeroth-order stochastic optimization algorithm is proposed. In this algorithm, the auxiliary optimization-based strategy is adopted to update the state of agents, and a single-point gradient estimator is used to estimate the local gradient. When running this algorithm, less real information is used than the algorithm in [28], which makes our algorithm more practical than that in [28]. This is different from [18] and [19], where agent estimates the true gradient based on the two-point gradient estimator, here the single-point gradient estimator can be realized even the values of some random variables change rapidly. The performance of this algorithm is measured by the expectation of dynamic regret. We prove that if the graph is

G

-strongly connected, and the cumulative deviation of the minimizer sequence grows with a certain rate, then the expectation of dynamic regret grows sublinearly.

This paper is organized as follows. In Section 2, We formulate the problem and propose an online zeroth-order stochastic optimization algorithm based on single-point gradient estimator. In Section 3, We state the main results of this paper and give the proof. A simulation example is given in Section 4. The Section 5 is the conclusion of the full text.

Notations: The absolute value of the scalar

μ

is denoted by

| μ |

. The set of real numbers is denoted by

R

N

is used to represent the set of positive integers. For any

T \in N

, we denote set

⌊ T ⌋ = {0, 1, \dots, T}

. We use

R^{M}

to denote

M

-dimensional real vector space. For given vectors

u, v \in R^{M}

, and matrix

P \in R^{M \times M}

, we denote

⟨ u, v ⟩_{P} = ⟨ P u, v ⟩

∥ μ ∥_{P}^{2} = μ^{T} P μ

∥ μ ∥ = \sqrt{μ^{T} μ}

and

∥ μ ∥_{1} = \sum_{j = 1}^{M} | μ_{i} |

, where

μ_{i}

represents the

i^{t h}

entry of vector

μ

. For function

g (\cdot) : R^{M} \to R

, We use

\nabla g (μ)

to represent the gradient of

g (μ)

, and use

\nabla^{2} g (μ)

to denote its Hessian matrix.

I_{n}

is a

n \times n

identity matrix.

1_{M} \in R^{M}

is an

M

-dimensional vector whose elements are all ones. For a matrix

N

[N]_{i j}

denotes the matrix entry in the

i^{t h}

row and

j^{t h}

column,

[N]_{i \cdot}

represents the

i^{t h}

row of the matrix

N

λ_{max} (N)

and

λ_{min} (N)

represent the maximum and minimum eigenvalues of

N

, and denote

∥ N ∥ = \sqrt{λ_{max} (N^{T} N})

2 Problem Formulation

2.1 Basic Graph Theory

A time-varying directed communication graph is set as

G (t) = (V, E (t), X (t))

, where

V = {1, \dots, n}

is a set of vertices,

E (t) \subset V \times V

is an edge set.

X (t) = (x_{i j} (t))_{n \times n}

is a non-negative matrix to represent the weight of adjacent edges, where

x_{i j} (t) > l

for some

l > 0

(j, i) \in E (t)

and

x_{i j} (t) = 0

otherwise. We denote

N_{i} (t) = {j \in V ∣ (j, i) \in E (t)}

as the neighbor set of agent

i

at time

t

, where

i \in N_{i} (t)

for any

i \in V

. For a fixed topology

G = (V, E, X)

r

represents the length of path between node

i_{1}

and node

i_{r + 1}

, where the path is a sequence of

r + 1

distinct nodes

i_{1} \cdot \cdot \cdot i_{r + 1}

and

(i_{u}, i_{u + 1}) \in V

for

u = 1, \cdot \cdot \cdot, r

{G (t)}

is strongly connected if there is a path between any pair of distinct nodes. For

{G (t)}

, an

G

-edge set is defined as

E_{G} (t) = ⋃_{k = t G}^{(t + 1) G - 1} E (k)

for some constant

G > 0

. Under the above conditions, if

{G (t)}

with

V

and

E_{G} (t)

is strongly connected for any

t \geq 0

, then

{G (t)}

is called

G

-strongly connected.

The following assumption is made for the graph.

Assumption 1 For any

t \geq 0

G (t)

is a

G

-strongly connected graph and

X (t)

is a doubly stochastic matrix.

The connectivity of graph in Assumption 1 plays an important role in facilitating agents to achieve a common state^{[12, 33]}. The following lemma in [12] is recalled.

Lemma 1^[12] Under Assumption $1$ , for any $i, j \in V$ , there exists certain $B > 0$ and $0 < λ < 1$ satisfying

\begin{array}{rcl} \begin{aligned} | Ψ (t, m)_{i j} - \frac{1}{n} | \leq B λ^{t - m}, \end{aligned} \end{array}

(1)

where $Ψ (t, m)$ is the state transition matrix of the consensus model $ϱ (t + 1) = X ϱ (t)$ with $ϱ (0) \in R^{n}$ for any $t \geq m \geq 0$ .

2.2 Online Distributed Pseudoconvex Optimization

Consider a multi-agent system consisting of

n

agents, labeled by set

V = {1, \dots, n}

. Agents communicate with each other through the digraph

{G (t)}

. For agent

i \in V

, a sequence of cost functions is given by

{g_{i}^{1}, \dots, g_{i}^{T}}

, where

g_{i}^{t} : Λ \to R

is twice differentiable for any

t \in ⌊ T ⌋

T \in N

is unknown to the agents, and

μ \subset R^{M}

. At each iteration time

t \in ⌊ T ⌋

, agent

i

selects a state

μ_{i} (t) \in Λ

. Then, agent

i

receives a local cost function

g_{i}^{t}

, which means that the cost function information is not available until the agent updates actions. In this case, at each iteration time

t

, each agent attempts to cooperatively solve the following optimization problems

\begin{array}{rcl} \begin{aligned} min g^{t} (μ) = \sum_{i = 1}^{n} g_{i}^{t} (μ), subject to μ \in Λ . \end{aligned} \end{array}

(2)

For any

μ \in Λ

and some random vector

ζ_{i} \in S_{i} \subseteq R^{M}

, denote

F_{i} (μ; ζ_{i})

Λ \times S_{i} ⟶ R

as the local cost function of agent

i

. We use

ζ_{i}

to describe some nonadditive random processes, and the expected local cost function is introduced here

g_{i} (μ) = E_{ζ} [F_{i} (μ; ζ_{i})]

. (2) is rewritten as follows

\begin{array}{rcl} \begin{aligned} min_{μ \in Λ} g^{t} (μ) = \sum_{i = 1}^{n} E_{ζ} [F_{i} (μ; ζ_{i})] . \end{aligned} \end{array}

(3)

Here some basic assumptions are made for the problem.

Assumption 2

Λ

is non-empty, bounded and compact, i.e., for any

p, q \in Λ

, there holds

∥ p - q ∥ \leq κ

, where

κ

is a positive constant. Moreover,

Λ

is a closed convex set.

Assumption 3 For any

t \in ⌊ T ⌋

, each

g^{t}

o

-strongly pseudoconvex on

Λ

i.e., for any

p, q \in Λ

⟨ \nabla g^{t} (p), q - p ⟩ \geq 0

implies

⟨ \nabla g^{t} (q), p - q ⟩ \geq o ∥ q - p ∥^{2}

with some constant

o > 0

Remark 1 In Assumption 3, the gradient of the cost function satisfies strong pseudomonotonicity. Note that, if

\nabla g^{t}

is strongly pseudomonotone, then

g^{t}

is strongly pseudoconvex^[34]. The definitions of pseudoconvex, quasiconvex and convex optimization and their relationships are discussed as follows. For a given differentiable function

g (\cdot) : R^{m} \to R

g (\cdot)

is pseudoconvex on

Λ \subset R^{m}

if for any two different points

u, v \in Λ

⟨ \nabla g (u), v - u ⟩ \geq 0

implies

g (v) - g (u) \geq 0

. Moreover, if for any two different points

u, v \in Λ

⟨ \nabla g (u), v - u ⟩ \geq 0

implies

g (v) - g (u) \geq o / 2 ∥ v - u ∥^{2}

with some constant

o > 0

, then

g (\cdot)

is called a

o

-strongly pseudoconvex function on

Λ

. And

g (\cdot)

is convex on

Λ

if for any two different points

u, v \in Λ

, implies

g (v) - g (u) \geq ⟨ \nabla g (u), v - u ⟩

. Moreover,

g (\cdot)

is quasiconvex on

Λ

if for any two different points

u, v \in Λ

and

0 \leq v \leq 1

, implies

g (v v + (1 - v) u) \leq max {g (u), g (v)}

. According to the above definitions, it is not difficult to verify that quasiconvex function includes convex function and pseudoconvex function, and convex function is a special case of pseudoconvex function. Compared with convex optimization^[30–32], strongly pseudoconvex optimization problems is more general. It is undeniable that quasiconvex is more complex than pseudoconvex, and it is a more general nonconvex problem. In our future works, we will consider the case with quasiconvex.

Assumption 4 For any

i \in V

| F_{i} (μ; ζ_{i}) | \leq C_{1} \leq + \infty

Λ

Any online algorithm should mimic the performance of its offline counterpart, and the gap between them is called regret. If offline benchmark for agents is to minimize

\sum_{t = 0}^{T} g^{t} (μ)

, then the regret is called static regret^[1], which is defined as

\begin{array}{rcl} \begin{aligned} R_{i}^{s} (T) = \sum_{t = 0}^{T} g_{i}^{t} (μ_{i} (t)) - \sum_{t = 0}^{T} g_{i}^{t} (μ^{*}), i \in V, \end{aligned} \end{array}

(4)

where

μ^{*} = \arg min_{μ \in Λ} \sum_{t = 0}^{T} g^{t} (μ)

. Here the offline benchmark for agents is to minimize

g^{t} (μ)

at each time, and such regret is called dynamic regret^[34], which is defined as

\begin{array}{rcl} \begin{aligned} R_{i}^{d} (T) = \sum_{t = 0}^{T} g_{i}^{t} (μ_{i} (t)) - \sum_{t = 0}^{T} g_{i}^{t} (μ^{*} (t)), i \in V, \end{aligned} \end{array}

(5)

where

μ^{*} (t) = \arg min_{μ \in Λ} g^{t} (μ)

for any

t \in ⌊ T ⌋

. The offline benchmark of dynamic regret (5) is more stringent than that of static regret. It is undeniable that dynamic regret may fail in the worst case. Inspired by ^[30–32], we use the following deviation of the minimizer sequence

{μ^{*} (t)}_{t = 0}^{T}

to describe the difficulty:

\begin{array}{rcl} \begin{aligned} Γ_{T} = \sum_{t = 0}^{T} ∥ μ^{*} (t + 1) - μ^{*} (t) ∥ . \end{aligned} \end{array}

(6)

2.3 Online Distributed Algorithms

Consider the following centralized optimization problem

\begin{array}{rcl} \begin{aligned} min g (μ), s u b j e c t t o μ \in Λ, \end{aligned} \end{array}

(7)

where

Λ

is a closed convex set,

g : R^{m} \to R

is a strongly pseudoconvex function, and the real gradients of objective function are not available. To address this problem, motivated by [35], We begin to construct the gradient estimator.

Consider a gradient estimator which is based on a single-point evaluation of local cost function. At each time

t

, agent

i

estimates the gradient of its local cost function with

\begin{array}{rcl} {\hat{h}}_{i} (t) = \frac{v_{i, t} F_{i} (μ_{i} (t) + δ (t) v_{i, t}; ζ_{i, t})}{δ (t)}, \end{array}

(8)

where

δ (t) > 0

represents the step-size and

v_{i, t} \in [- V, V]^{M} \subseteq R^{M}

is a random perturbation vector which is independently generated by each agent.

To solve (2), an online zeroth-order stochastic optimization algorithm is proposed as follows

\begin{array}{rcl} {\begin{array}{c} \begin{aligned} μ_{i} (t + 1) = \arg min_{μ \in Λ} {μ^{T} P μ + ⟨ θ (t) {\hat{h}}_{i} (t) - 2 P z_{i} (t), μ ⟩}, \\ z_{i} (t) = \sum_{j \in N_{i} (t)} x_{i j} μ_{j} (t), \\ {\hat{h}}_{i} (t) = \frac{v_{i, t} F_{i} (μ_{i} (t) + δ (t) v_{i, t}; ζ_{i, t})}{δ (t)}, \end{aligned} \end{array} \end{array}

(9)

for any

i \in V

, where

μ_{i} (t)

is the state of agent

i

used to estimate the minimizer of the objective function at time

t \in ⌊ T ⌋

μ_{i} (0) = μ_{i 0} \in Λ

P

is a symmetric and positive definite matrix. Each agent updates its state by the average-consensus algorithm

θ (t)

and

δ (t)

are positive and decaying learning rates, and initial value

θ (0) = θ_{0} > 0

Motivated by the single-point gradient estimate strategy^[35], consensus algorithm^[36], and the auxiliary optimization algorithm^[33], we propose the algorithm (9). The consensus term

z_{i} (t)

is inspired by the consensus algorithm in [10] and [36]. Under this algorithm, each agent updates actions according to its own state, the state information received from its neighbors at the current moment and the information of gradient estimation. This determines algorithm (9) is distributed and online.

Remark 2 By implementing the algorithm, matrix

P

should be known in prior by all agents, which may prevent the proposed algorithm from being fully distributed. In fact, in a balanced and periodically strongly connected (

G

-strongly connected) communication graph, it is not difficult to determine a common constant

P

for each agent by only using local information. For example, the local initial state of each agent is set to be a symmetric and positive definite matrix

P_{i}

i \in V

. Each agent updates its state by the average-consensus algorithm

\begin{aligned} P_{i} (t + 1) = \sum_{j = 1}^{n} x_{i j} (t) P_{j} (t), \end{aligned}

then all agents' states converge to

\frac{1}{n} \sum_{i = 1}^{n} P_{i}

, which helps each one achieve a common symmetric and positive definite matrix

P

3 Main Results

Theorem 1 Under Assumptions $1$ , $2$ , $3$ and $4$ , if the learning rates are selected to be $θ (t) = \frac{α}{(t + 1)^{\frac{2}{3}}}$ , $δ (t) = \frac{β}{(t + 1)^{\frac{1}{6}}}$ , where $α, β > 0$ , then for any $i \in V$ and learning time $T \in N$ , we have

\begin{array}{rcl} \begin{aligned} E [R_{i}^{d} (T)] \leq n δ_{1} \sqrt{Q + \frac{16 K L Γ_{T}}{o α \ln 2}} ((T + 1)^{5 / 6} \sqrt{\ln (T + 1)}), \end{aligned} \end{array}

(10)

where $Q = \frac{C_{2} + 3 C_{3} α^{2} / β^{2} + 24 C_{5} λ α / (o β^{2}) + 12 κ M^{\frac{3}{2}} V^{2} σ_{1} / (o α)}{λ (1 - λ) \ln 2} + \frac{4 d}{o α}$ , $ξ = B \sqrt{M} \sum_{i = 1}^{n} ∥ μ_{i} (0) ∥_{1}$ , $S_{1} = ξ^{2} + \frac{n ξ B C_{1} M V θ_{0}}{σ (1 - λ) δ_{0}}$ , $S_{2} \frac{(n B M V C_{1})^{2}}{4 σ^{2} (1 - λ)}$ , $C_{2} = 4 C_{4} / (o α) + 2 S_{1}$ , $C_{3} = 20 n L S_{2} / (o α) + 2 S_{2}$ , $C_{4} = 5 n L S_{1} + n (κ σ_{1} + δ_{1}) α$ , $C_{5} = 5 n L S_{2}$ , $d = L κ$ , $K = sup_{μ \in Λ} ∥ μ ∥$ , $δ_{1} = sup_{t \in ⌊ T ⌋, i \in V, μ \in Λ} ∥ \nabla g_{i}^{t} (μ_{i} (t)) ∥$ , $L = n λ_{max} (P)$ , $σ = λ_{min} (P)$ , $σ_{1} = sup_{t \in ⌊ T ⌋, i \in V, μ \in Λ} ∥ \nabla^{2} g_{i}^{t} (μ_{i} (t)) ∥$ .

Remark 3 By Theorem 1, we can see that

Γ_{T}

plays an important role in the sublinear boundary of dynamic regret. It can be noted that

lim_{T \to \infty} \frac{(T + 1)^{5 / 6} \sqrt{\ln (T + 1)}}{T} = 0

. If

Γ_{T}

sublinearly grows with

\frac{(T + 1)^{1 / 3}}{\ln (T + 1)}

, then

lim_{T \to \infty} \frac{\sqrt{Γ_{T}} (T + 1)^{5 / 6} \sqrt{\ln (T + 1)}}{T} = 0

, which implies online distributed algorithm (9) solves problem (1) well. Therefore, the algorithm (9) is suitable for solving the strongly pseudoconvex optimization problem, where the real gradients of objective function are not available. If the amplitude of minimizer sequence

{μ^{*} (t)}_{t = 0}^{T}

change is drastic,

Γ_{T}

might become linear with

\frac{(T + 1)^{1 / 3}}{\ln (T + 1)}

, then the performance of algorithm (9) cannot be guaranteed. This is natural since the problem is also unsolvable in the worst case, even in online convex optimization^[30–32]. In convergence analysis, there is an error between the estimate gradient and the real gradient, which enlarges the bound of the dynamic regret. In order to overcome this difficulty, we construct a new smooth function to establish the relationship between the estimate gradient and the real gradient. By selecting an appropriate step-size of the smoothing function and an appropriate learning rate of the online algorithm, we prove that the expectation of the regret function increases sublinearly. In addition, the convergence rate of this algorithm is slower than that of [28]. This is normal since the algorithm uses less information and does not directly use the real gradient information.

Before giving the proof of Theorem 1, some useful lemmas need to be provided. First, the Karush-Kuhn-Tucker (KKT) condition for pseudoconvex optimization is recalled.

Lemma 2^[1] $μ^{*}$ is a minimum point of $g (μ)$ on $Υ$ if and only if $⟨ \nabla g (μ^{*}), μ - μ^{*} ⟩ \geq 0$ , $\forall μ \in Υ$ .

Lemma 3^[35] Suppose that for any $i \in V$ and $t \in ⌊ T ⌋$ , the random perturbation vector $∥ v_{i t} ∥\leq \sqrt{M} V$ , and $∥ {\hat{h}}_{i} (t) ∥\leq \frac{C_{1} \sqrt{M} V}{δ (t)}$ , there holds $\nabla g_{i} (μ_{i} (t)) = E [{\hat{h}}_{i} (t) | μ_{i} (t)] - b_{i} (t)$ , where $b_{i} (t)$ represents a estimation bias of the gradient estimator with the property $∥ b_{i} (t) ∥\leq \frac{1}{2} M^{\frac{3}{2}} V^{2} σ_{1} δ (t)$ .

Now we present the following lemma, which gives the upper bound of the error of each agent's state and the average value in each iteration time under algorithm (9).

Lemma 4 Under Assumptions $1$ , $2$ , $3$ and $4$ , for any $t \in ⌊ T ⌋$ ,

\begin{array}{rcl} \begin{aligned} ∥ μ_{i} (t) - \bar{μ} (t) ∥ \leq ξ λ^{t} + \frac{n B M V C_{1}}{σ} \sum_{τ = 0}^{t} λ^{t - τ} \frac{θ (τ)}{δ (τ)} \end{aligned} \end{array}

(11)

and

\begin{array}{rcl} \begin{aligned} ∥ μ_{i} (t) - \bar{μ} (t) ∥^{2} \leq S_{1} λ^{t} + S_{2} \sum_{τ = 0}^{t} λ^{t - τ} {(\frac{θ (τ)}{δ (τ)})}^{2}, \end{aligned} \end{array}

(12)

where

\bar{μ} (t) = \frac{1}{n} \sum_{i = 1}^{n} μ_{i} (t)

and

i \in V

Proof. Using KKT condition to measure the the first equation of algorithm (9), there holds

\begin{array}{rcl} \begin{aligned} ⟨ μ_{i} (t + 1) - z_{i} (t), μ_{i} (t + 1) - μ ⟩_{P} \leq \frac{θ (t)}{2} ⟨ {\hat{h}}_{i} (t), μ - μ_{i} (t + 1) ⟩, \end{aligned} \end{array}

for any

μ \in Λ

. By the convexity of

Λ

, we have

z_{i} (t) \in Λ

for any

μ_{i} (t) \in Λ

and

i \in V

. Let

μ = z_{i} (t)

, together with the facts that

2 σ ∥ μ_{i} (t + 1) - z_{i} (t) ∥^{2} \leq∥ μ_{i} (t + 1) - z_{i} (t) ∥_{P}^{2}

and

∥ {\hat{h}}_{i} (t) ∥\leq \frac{C_{1} \sqrt{M} V}{δ (t)}

, we have

\begin{array}{rcl} \begin{aligned} 2 σ ∥ μ_{i} (t + 1) - z_{i} (t) ∥^{2} \leq \frac{C_{1} \sqrt{M} V θ (t)}{2 δ (t)} ∥ μ_{i} (t + 1) - z_{i} (t) ∥ . \end{aligned} \end{array}

(13)

Let us denote

q_{i} (t) = μ_{i} (t + 1) - z_{i} (t)

, from (13), there holds

∥ q_{i} (t) ∥ \leq \frac{C_{1} \sqrt{M} V θ (t)}{2 σ δ (t)}

. Moreover,

\begin{array}{rcl} μ_{i} (t + 1) = \sum_{j \in N_{i} (t)} x_{i j} μ_{j} (t) + q_{i} (t) . \end{array}

Let

{\tilde{μ}}_{γ} (t) \in R^{n}

and

{\tilde{q}}_{γ} (t) \in R^{n}

to represent the stack of the

γ^{t h}

term of

μ_{i} (t)

and the stack of the

γ^{t h}

term of

q_{γ} (t)

, respectively, where

i \in V

. Their relationship is as follows

\begin{array}{rcl} {\tilde{μ}}_{γ} (t + 1) = X (t) {\tilde{μ}}_{γ} (t) + {\tilde{q}}_{γ} (t) . \end{array}

which implies that

\begin{array}{rcl} {\tilde{μ}}_{γ} (t) = Ψ (t, 0) {\tilde{μ}}_{γ} (0) + \sum_{τ = 1}^{t} Ψ (t, τ) {\tilde{q}}_{γ} (τ) . \end{array}

(14)

From (1) we know that

Ψ (t, m)

is a doubly stochastic matrix for any

t \geq m \geq 0

, then

\begin{array}{rcl} 1^{T} {\tilde{μ}}_{γ} (t) = 1^{T} {\tilde{μ}}_{γ} (0) + \sum_{τ = 1}^{t} 1^{T} {\tilde{q}}_{γ} (τ - 1) . \end{array}

(15)

From (14) and (15), there holds

\begin{array}{rcl} \begin{aligned} | [{\tilde{μ}}_{γ} (t)]_{i} - \frac{1}{n} 1^{T} {\tilde{μ}}_{γ} (t) | \\ \leq | ([Ψ (t, 0)]_{i \cdot} - \frac{1}{n} 1^{T}) {\tilde{μ}}_{γ} (0) | + \sum_{τ = 1}^{t} 1^{T} | ([Ψ (t, τ)]_{i \cdot} - \frac{1}{n} 1^{T}) {\tilde{q}}_{γ} (τ - 1) | \\ \leq max_{1 \leq j \leq n} | [Ψ (t, 0)]_{i j} - \frac{1}{n} | ∥ {\tilde{μ}}_{γ} (0) ∥_{1} + \\ \frac{n C_{1} \sqrt{M} V}{2 σ} \sum_{τ = 1}^{t} \frac{θ (τ - 1)}{δ (τ - 1)} max_{1 \leq j \leq n} | [Ψ (t, τ)]_{i j} - \frac{1}{n} |, \end{aligned} \end{array}

for any

i \in V

. By (1), there holds

\begin{array}{rcl} \begin{aligned} | [{\tilde{μ}}_{γ} (t)]_{i} - \frac{1}{n} 1^{T} {\tilde{μ}}_{γ} (t) | \leq B λ^{t} ∥ {\tilde{μ}}_{γ} (0) ∥_{1} + \frac{n C_{1} \sqrt{M} V B}{2 σ} \sum_{τ = 0}^{t} λ^{t - τ} \frac{θ (τ)}{δ (τ)} . \end{aligned} \end{array}

This result proves (11). Moreover, due to the facts that

\frac{θ (t)}{δ (t)}

is non-increasing and

0 < λ < 1

, we have

\begin{array}{rcl} \begin{aligned} ∥ μ_{i} (t) - \bar{μ} (t) ∥^{2} \leq (ξ^{2} + \frac{ξ H n C_{1} M V θ_{0}}{σ (1 - λ) δ_{0}}) λ^{t} + \frac{(n M H C_{1} V)^{2}}{4 σ^{2}} {(\sum_{τ = 0}^{t} λ^{t - τ} \frac{θ (τ)}{δ (τ)})}^{2} . \end{aligned} \end{array}

(16)

Using Cauchy-Schwarz inequality yields

\begin{array}{rcl} \begin{aligned} {(\sum_{τ = 0}^{t} λ^{t - τ} \frac{θ (τ)}{δ (τ)})}^{2} \leq (\sum_{τ = 0}^{t} λ^{t - τ}) (\sum_{τ = 0}^{t} λ^{t - τ} {(\frac{θ (τ)}{δ (τ)})}^{2}) \\ \leq \frac{1}{1 - λ} \sum_{τ = 0}^{t} λ^{t - τ} {(\frac{θ (τ)}{δ (τ)})}^{2} . \end{aligned} \end{array}

(17)

Substituting (17) into (16) proves the validity of (12).

Then we give the cumulative upper bound of the error between the average state of the agent and the minimum point.

Lemma 5 Under Assumptions $1$ , $2$ and $3$ , if $θ (t)$ is non-increasing, there holds

\begin{array}{rcl} \begin{aligned} \sum_{t = 0}^{T} ∥ \bar{μ} (t) - μ^{*} (t) ∥^{2} \\ \leq \frac{2 C_{4}}{o θ (T)} \sum_{t = 0}^{T} λ^{t} + \frac{10 n L S_{2}}{o θ (T)} \sum_{t = 0}^{T} \sum_{τ = 0}^{t + 1} λ^{t - τ} {(\frac{θ (t)}{δ (t)})}^{2} \\ + \frac{8 K L Γ_{T}}{o θ (T)} + \frac{2 \sum_{t = 0}^{T} ω}{o θ (T)} + \frac{2 d}{o θ (T)} + \frac{2 C_{5}}{o θ (T)} \sum_{t = 0}^{T} \sum_{τ = 0}^{t + 1} λ^{t - τ} \frac{(θ (t)^{2})}{δ (t)} . \end{aligned} \end{array}

(18)

To prove Lemma 5, we give an auxiliary lemma as follows.

Lemma 6 Under Assumptions $1$ , $2$ and $3$ , for any $y \in R^{M}$ and $t \in ⌊ T ⌋$ ,

\begin{array}{rcl} \begin{aligned} \sum_{i = 1}^{n} ⟨ μ_{i} (t) - z_{i} (t), y - μ_{i} (t + 1) ⟩_{P} \leq \frac{5 n L}{2} S_{1} \sum_{i = 1}^{n} λ^{t} + \frac{5 n L}{2} S_{2} \sum_{τ = 0}^{t + 1} λ^{t - τ} {(\frac{θ (τ)}{δ (τ)})}^{2} . \end{aligned} \end{array}

(19)

Proof. Note that

\begin{array}{rcl} \begin{aligned} \sum_{i = 1}^{n} ⟨ μ_{i} (t) - z_{i} (t), y - μ_{i} (t + 1) ⟩_{P} \\ = \sum_{i = 1}^{n} ⟨ μ_{i} (t) - z_{i} (t), y - \bar{μ} (t + 1) ⟩_{P} + \sum_{i = 1}^{n} ⟨ μ_{i} (t) - z_{i} (t), \bar{μ} (t + 1) - μ_{i} (t + 1) ⟩_{P} \\ \leq ⟨ \sum_{i = 1}^{n} (μ_{i} (t) - z_{i} (t)), y - \bar{μ} (t + 1) ⟩_{P} + L \sum_{i = 1}^{n} ∥ μ_{i} (t) - z_{i} (t) ∥ ∥ \bar{μ} (t + 1) - μ_{i} (t + 1) ∥ . \end{aligned} \end{array}

Then, note that

⟨ \sum_{i = 1}^{n} (μ_{i} (t) - z_{i} (t)), y - \bar{μ} (t + 1) ⟩_{P} = 0

. By using Young's inequality, there holds

\begin{array}{rcl} \begin{aligned} \sum_{i = 1}^{n} ⟨ μ_{i} (t) - z_{i} (t), y - μ_{i} (t + 1) ⟩_{P} \\ \leq 2 L \sum_{i = 1}^{n} ∥ μ_{i} (t) - \bar{μ} (t) ∥^{2} + \frac{L}{2} \sum_{i = 1}^{n} ∥ \bar{μ} (t + 1) - μ_{i} (t + 1) ∥^{2} . \end{aligned} \end{array}

(20)

Using (12) and substituting

1 > λ > 0

into (20) it yields (19).

Proof. (Proof of Lemma 5) Note that,

\begin{array}{rcl} \begin{aligned} \frac{1}{2} ∥ μ^{*} (t + 1) - μ_{i} (t + 1) ∥_{p}^{2} - \frac{1}{2} ∥ μ^{*} (t) - μ_{i} (t) ∥_{p}^{2} \\ = - \frac{1}{2} ∥ μ_{i} (t) - μ_{i} (t + 1) ∥_{p}^{2} + ⟨ μ_{i} (t) - μ_{i} (t + 1), μ^{*} (t) - μ_{i} (t + 1) ⟩_{p} \\ + \sum_{i = 1}^{n} ⟨ \frac{1}{2} (μ^{*} (t + 1) + μ^{*} (t)) - μ_{i} (t + 1), μ^{*} (t + 1) - μ^{*} (t) ⟩_{p} . \end{aligned} \end{array}

(21)

for any

i \in V

. Now we denote

D (t) = \frac{1}{2} \sum_{i = 1}^{n} ∥ μ^{*} (t) - μ_{i} (t) ∥_{p}^{2}

, there holds

\begin{array}{rcl} \begin{aligned} \nabla D (t) = D (t + 1) - D (t) \\ = - \frac{1}{2} \sum_{i = 1}^{n} ∥ μ_{i} (t) - μ_{i} (t + 1) ∥_{p}^{2} + \sum_{i = 1}^{n} ⟨ μ_{i} (t) - μ_{i} (t + 1), μ^{*} (t) - μ_{i} (t + 1) ⟩_{p} \\ + \sum_{i = 1}^{n} ⟨ \frac{1}{2} (μ^{*} (t + 1) + μ^{*} (t)) - μ_{i} (t + 1), μ^{*} (t + 1) - μ^{*} (t) ⟩_{p} \\ \leq \sum_{i = 1}^{n} ⟨ μ_{i} (t) - μ_{i} (t + 1), μ^{*} (t) - μ_{i} (t + 1) ⟩_{p} - \sum_{i = 1}^{n} \frac{σ}{2} ∥ μ_{i} (t) - μ_{i} (t + 1) ∥^{2} \\ + 2 k L ∥ μ^{*} (t + 1) - μ^{*} (t) ∥ . \end{aligned} \end{array}

(22)

Then, by KKT condition, there holds

\begin{array}{rcl} \begin{aligned} ⟨ z_{i} (t) - μ_{i} (t + 1), μ^{*} (t) - μ_{i} (t + 1) ⟩_{p} \leq \frac{θ (t)}{2} ⟨ {\hat{h}}_{i} (t), μ^{*} (t) - μ_{i} (t + 1) ⟩, \end{aligned} \end{array}

for any

i \in V

. By (19) we have

\begin{array}{rcl} \begin{aligned} \sum_{i = 1}^{n} ⟨ μ_{i} (t) - μ_{i} (t + 1), μ^{*} (t) - μ_{i} (t + 1) ⟩_{p} \\ = \sum_{i = 1}^{n} ⟨ z_{i} (t) - μ_{i} (t + 1), μ^{*} (t) - μ_{i} (t + 1) ⟩_{p} + \sum_{i = 1}^{n} ⟨ μ_{i} (t) - z_{i} (t), μ^{*} (t) - μ_{i} (t + 1) ⟩_{p} \\ \leq \frac{θ (t)}{2} ⟨ {\hat{h}}_{i} (t), μ^{*} (t) - μ_{i} (t + 1) ⟩ + \frac{5 n L}{2} S_{1} \sum_{i = 1}^{n} λ^{t} + \frac{5 n L}{2} S_{2} \sum_{τ = 0}^{t + 1} λ^{t - τ} {(\frac{θ (τ)}{δ (τ)})}^{2} . \end{aligned} \end{array}

(23)

Under Assumption 3, by Lemma 2, we have

\begin{array}{rcl} ⟨ \sum_{i = 1}^{n} \nablag_{i}^{t} (\bar{μ} (t)), \bar{μ} (t) - μ^{*} (t) ⟩ \geq \frac{o}{2} ∥ \bar{μ} (t) - μ^{*} (t) ∥^{2} . \end{array}

Note that

∥ \nabla^{2} g_{i}^{t} ∥ \leq σ_{1}

for any

μ \in Λ

, we have

∥ \nabla g_{i}^{t} (μ_{i} (t)) - \nabla g_{i}^{t} (\bar{μ} (t)) ∥ \leq σ_{1} ∥ μ_{i} (t) - \bar{μ} (t) ∥

for any

i \in V

. Since

Λ

in Assumption 2 is bounded, there holds

\begin{array}{rcl} \begin{aligned} \sum_{i = 1}^{n} ⟨ \nabla g_{i}^{t} (μ_{i} (t)), μ^{*} (t) - μ_{i} (t) ⟩ \\ = \sum_{i = 1}^{n} ⟨ \nabla g_{i}^{t} (μ_{i} (t)) - \nabla g_{i}^{t} (\bar{μ} (t)), μ^{*} (t) - μ_{i} (t) ⟩ \\ + \sum_{i = 1}^{n} ⟨ \nabla g_{i}^{t} (\bar{μ} (t)), \bar{μ} (t) - μ_{i} (t) ⟩ - \sum_{i = 1}^{n} ⟨ \nabla g_{i}^{t} (\bar{μ} (t)), \bar{μ} (t) - μ^{*} (t) ⟩ \\ \leq \sum_{i = 1}^{n} (κ σ_{1} + δ_{1}) ∥ μ_{i} (t) - \bar{μ} (t) ∥ - \frac{o}{2} ∥ \bar{μ} (t) - μ^{*} (t) ∥^{2} . \end{aligned} \end{array}

Then,

\begin{array}{rcl} \begin{aligned} θ (t) \sum_{i = 1}^{n} ⟨ {\hat{h}}_{i} (t), μ^{*} (t) - μ_{i} (t + 1) ⟩ \\ = θ (t) \sum_{i = 1}^{n} ⟨ {\hat{h}}_{i} (t), μ^{*} (t) - μ_{i} (t) ⟩ + θ (t) \sum_{i = 1}^{n} ⟨ {\hat{h}}_{i} (t), μ_{i} (t) - μ_{i} (t + 1) ⟩ \\ = θ (t) \sum_{i = 1}^{n} ⟨ {\hat{h}}_{i} (t) - \nabla g_{i}^{t} (μ_{i} (t)), μ^{*} (t) - μ_{i} (t) ⟩ \\ + θ (t) \sum_{i = 1}^{n} ⟨ \nabla g_{i}^{t} (μ_{i} (t)), μ^{*} (t) - μ_{i} (t) ⟩ \\ + θ (t) \sum_{i = 1}^{n} ⟨ {\hat{h}}_{i} (t), μ_{i} (t) - μ_{i} (t + 1) ⟩ . \end{aligned} \end{array}

(24)

By using Young's inequality, we have

\begin{array}{rcl} \begin{aligned} θ (t) \sum_{i = 1}^{n} ⟨ {\hat{h}}_{i} (t), μ_{i} (t) - μ_{i} (t + 1) ⟩ \leq \sum_{i = 1}^{n} \frac{σ}{2} ∥ μ_{i} (t) - μ_{i} (t + 1) ∥^{2} + \frac{n M V^{2} C_{1}^{2} θ (t)^{2}}{2 σ δ (t)^{2}} . \end{aligned} \end{array}

(25)

we denote

ω = θ (t) \sum_{i = 1}^{n} ⟨ {\hat{h}}_{i} (t) - \nabla g_{i}^{t} (μ_{i} (t)), μ^{*} (t) - μ_{i} (t) ⟩

, by (22)

\sim

(25) and using (9), there holds

\begin{array}{rcl} \begin{aligned} \nabla D (t) \leq \frac{1}{2} ω - \frac{o θ (t)}{4} ∥ \bar{μ} (t) - μ^{*} (t) ∥^{2} + \frac{n M V^{2} C_{1}^{2}}{4 σ} {(\frac{θ (t)}{δ (t)})}^{2} + 2 K L ∥ μ^{*} (t) - μ^{*} (t + 1) ∥ \\ + \frac{C_{4}}{2} λ^{t} + \frac{C_{5}}{2} \sum_{τ = 0}^{t + 1} λ^{t - τ} \frac{{(θ (t))}^{2}}{δ (t)} + \frac{5 n L S_{2}}{2} \sum_{τ = 0}^{t + 1} λ^{t - τ} {(\frac{θ (t)}{δ (t)})}^{2} . \end{aligned} \end{array}

(26)

Due to the fact

\nabla D (t) \geq 0

for any

t \in ⌊ T ⌋

, there holds

- \sum_{t = 0}^{T} \nabla D (t) = D (0) - D (T) \geq D (0) \geq d / 2

. By accumulating the two sides of (26) from

t = 0

T

, we can achieve (18) in Lemma 5.

Proof. (Proof of Theorem

1

) By Lemma 4 and Lemma 5, for any

i \in V

, there holds

\begin{array}{rcl} \begin{aligned} \sum_{t = 0}^{T} ∥ μ_{i} (t) - μ^{*} (t) ∥^{2} \\ \leq 2 \sum_{t = 0}^{T} ∥ \bar{μ} (t) - μ^{*} (t) ∥^{2} + 2 \sum_{t = 0}^{T} ∥ μ_{i} (t) - \bar{μ} (t) ∥^{2} \\ \leq (\frac{4 C_{4}}{o θ (T)} + 2 S_{1}) \sum_{t = 0}^{T} λ^{t} + \frac{16 K L Γ_{T}}{o θ (T)} + \frac{4 \sum_{t = 0}^{T} ω}{o θ (T)} \\ + (\frac{20 n L S_{2}}{o θ (T)} + 2 S_{2}) \sum_{t = 0}^{T} \sum_{τ = 0}^{t + 1} λ^{t - τ} {(\frac{θ (t)}{δ (t)})}^{2} \\ + \frac{4 C_{5}}{o θ (T)} \sum_{t = 0}^{T} \sum_{τ = 0}^{t + 1} λ^{t - τ} \frac{(θ (t)^{2})}{δ (t)} + \frac{4 d}{o θ (T)} . \end{aligned} \end{array}

(27)

Let

θ (t) = \frac{α}{(t + 1)^{\frac{2}{3}}}

and

δ (t) = \frac{β}{(t + 1)^{\frac{1}{6}}}

. Note that

\sum_{t = 0}^{T} \frac{1}{1 + t} = \sum_{t = 1}^{T + 1} \frac{1}{t} \leq \int_{t = 1}^{T + 1} \frac{1}{t} d t = 1 + \ln (T + 1)

. Similarly,

\sum_{t = 0}^{T} \frac{1}{(1 + t)^{\frac{7}{6}}} \leq - 6 (T + 1)^{- \frac{1}{6}} + 6

\sum_{t = 0}^{T} \frac{1}{(1 + t)^{\frac{5}{6}}} \leq= 6 (T + 1)^{\frac{1}{6}} - 6

and

\sum_{t = 0}^{T} λ^{t} \leq \frac{1}{1 - λ}

, we have

\begin{array}{rcl} \begin{aligned} \sum_{t = 0}^{T} \sum_{τ = 0}^{t + 1} λ^{t - τ} {(\frac{θ (t)}{δ (t)})}^{2} \\ = \sum_{τ = 1}^{T + 1} \sum_{t = τ - 1}^{T} λ^{t - τ} {(\frac{θ (t)}{δ (t)})}^{2} + \sum_{t = 0}^{T} λ^{t} {(\frac{θ_{0}}{δ_{0}})}^{2} \\ \leq (\sum_{τ = 1}^{T + 1} {(\frac{θ (t)}{δ (t)})}^{2}) (\sum_{t = 0}^{T} λ^{t}) + \sum_{t = 0}^{T} λ^{t} {(\frac{θ_{0}}{δ_{0}})}^{2} \\ \leq \frac{3 α^{2} \ln (T + 1)}{β^{2} λ (1 - λ) \ln 2}, \end{aligned} \end{array}

(28)

where the first equation is established by changing the order of summation. Note that for any

T \geq 1

yields

\ln (T + 1) \geq \ln 2

, which leads to the result of the last inequality. Similarly, we have

\sum_{t = 0}^{T} \sum_{τ = 0}^{t + 1} λ^{t - τ} \frac{(θ (t))^{2}}{δ (t)} \leq \frac{6 α^{2} λ}{β^{2} λ (1 - λ)}

From Lemma 2, taking the total expectation for

ω

yields

\begin{array}{rcl} \begin{aligned} \sum_{t = 0}^{T} E [ω] \leq \frac{1}{2} κ M^{\frac{3}{2}} V^{2} σ_{1} \sum_{t = 0}^{T} θ (t) δ (t) \\ \leq \frac{1}{2} κ M^{\frac{3}{2}} V^{2} σ_{1} (6 (T + 1)^{\frac{1}{6}} - 6) \\ \leq 3 κ M^{\frac{3}{2}} V^{2} σ_{1} (T + 1)^{\frac{1}{6}} . \end{aligned} \end{array}

(29)

Taking expectation on both sides of the inequality (27) results in that

\begin{array}{rcl} \begin{aligned} \sum_{t = 0}^{T} E ∥ μ_{i} (t) - μ^{*} (t) ∥^{2} \\ \leq (\frac{4 C_{4}}{o θ (T)} + 2 S_{1}) \sum_{t = 0}^{T} λ^{t} + \frac{16 K L Γ_{T}}{o θ (T)} + \frac{12 κ M^{\frac{3}{2}} V^{2} σ_{1} (T + 1)^{\frac{1}{6}}}{o θ (T)} \\ + (\frac{20 n L S_{2}}{o θ (T)} + 2 S_{2}) \sum_{t = 0}^{T} \sum_{τ = 0}^{t + 1} λ^{t - τ} {(\frac{θ (t)}{δ (t)})}^{2} + \frac{4 C_{5}}{o θ (T)} \sum_{t = 0}^{T} \sum_{τ = 0}^{t + 1} λ^{t - τ} \frac{(θ (t)^{2})}{δ (t)} + \frac{4 d}{o θ (T)} . \end{aligned} \end{array}

(30)

Then, using Jensen's inequality, we have

\begin{array}{rcl} \begin{aligned} {(\sum_{t = 0}^{T} ∥ μ_{i} (t) - μ^{*} (t) ∥)}^{2} \leq (T + 1) \sum_{t = 0}^{T} ∥ μ_{i} (t) - μ^{*} (t) ∥^{2} . \end{aligned} \end{array}

(31)

we can take expectation on both sides of (31) to get

\begin{array}{rcl} \begin{aligned} E {(\sum_{t = 0}^{T} ∥ μ_{i} (t) - μ^{*} (t) ∥)}^{2} \leq (T + 1) \sum_{t = 0}^{T} E ∥ μ_{i} (t) - μ^{*} (t) ∥^{2} . \end{aligned} \end{array}

(32)

Due to the fact

(\sum_{t = 0}^{T} E ∥ μ_{i} (t) - μ^{*} (t) ∥)^{2} \leq E (\sum_{t = 0}^{T} ∥ μ_{i} (t) - μ^{*} (t) ∥)^{2}

, and by (32), we have

\begin{array}{rcl} \begin{aligned} \sum_{t = 0}^{T} E ∥ μ_{i} (t) - μ^{*} (t) ∥ \leq \sqrt{(T + 1) \sum_{t = 0}^{T} E ∥ μ_{i} (t) - μ^{*} (t) ∥^{2}} . \end{aligned} \end{array}

(33)

By inequalities (27)

\sim

(33), we have

\begin{array}{rcl} \begin{aligned} \sum_{t = 0}^{T} E ∥ μ_{i} (t) - μ^{*} (t) ∥ \leq \sqrt{Q + \frac{16 K L Γ_{T}}{o α \ln 2}} ((T + 1)^{5 / 6} \sqrt{\ln (T + 1)}) . \end{aligned} \end{array}

(34)

Note that

∥ \nabla g_{i}^{t} (μ_{i} (t)) ∥ \leq δ_{1}

and due to the Lipschitz continuity of

g^{t}

, there holds

\begin{array}{rcl} \begin{aligned} R_{i}^{d} (T) = \sum_{t = 0}^{T} (g^{t} (μ_{i} (t)) - g^{t} (μ^{*} (t))) \\ \leq \sum_{t = 0}^{T} \sum_{j = 1}^{n} ∥ g_{j}^{t} (μ_{i} (t)) - g_{j}^{t} (μ^{*} (t)) ∥ \\ \leq n δ_{1} \sum_{t = 0}^{T} ∥ μ_{i} (t) - μ^{*} (t) ∥ . \end{aligned} \end{array}

(35)

for any

i \in V

. Taking the expectation of

R_{i}^{d} (T)

and substituting (34) into (35), it immediately implies (10).

4 A Simulation Example

In this section, we illustrate the performance of the algorithm through a simulation example. A multi-agent system is set up with six agents, labeled as

V = {1, \dots, 6}

, where each agent transmits information to its neighbors through a directed graph. As shown in Figure 1, there are

a

b

c

d

four possible digraph. The switching sequence of four graphs is

a \to b \to c \to d \to a \to \dots

. The weight of each edge is assumed to be

x_{i j} = \frac{1}{| N_{i} (t) |}

, where

| N_{i} (t) |

is the number of agent

i

's neighbors. Obviously, the union of these graphs is strongly connected. Here we set the learning time to be

T = 60

. At each time

t \in ⌊ T ⌋

, the objective function is given as follows

\begin{array}{rcl} \begin{aligned} g_{i}^{t} (μ) = a_{i} μ^{3} + f_{i} (t) μ, \end{aligned} \end{array}

Figure 1 The communication graph

Full size|PPT slide

where

μ \in R

and

i \in V

. In this simulation, parameters are selected as

a_{1} = 0.5

a_{2} = 0.8

a_{3} = a_{6} = 1

a_{4} = 0.6

a_{5} = 0.1

f_{i} (t)

is randomly selected from

[- i, i]

and subject to a uniform distribution. Moreover, the set constraint is given by

Λ = {μ | - 10 \leq μ \leq - 1}

. For any

t \geq 0

, it is not difficult to verify that

g^{t}

is strongly pseudoconvex on

Λ

. Now suppose that agent

i

can only access its local objective function

g_{i}^{t}

and set

Λ

. Algorithm (14) is applied to the problem, and

μ_{i} (t) \in R

is used to represent agent

i

's state. Initial states of agents are given by:

μ_{1} (0) = 0.3,

μ_{2} (0) = - 0.5,

μ_{3} (0) = - 0.4,

μ_{4} (0) = 0.5,

μ_{5} (0) = - 0.1

and

μ_{6} (0) = - 0.2

. By selecting

η_{t} = 1 / \sqrt{200 t + 500}

, the trajectories of agents' states are shown in Figure 2. Figure 3 displays the average regrets of all agents respectively and shows that each average regret decays to zero after a period of time. Thus, the observations are consistent with results established in Theorem 1.

Figure 2 The trajectories of agents' states under algorithm (9)

Full size|PPT slide

Figure 3 The trajectory of $R_{i} (t) / t$ , $i = 1, \dots, 6$

Full size|PPT slide

5 Conclusions

This paper studies the problem of online distributed optimization problem with strongly pseudoconvex function. To solve this problem, we propose an online distributed zeroth-order stochastic optimization algorithm based on the single-point gradient estimator. Under this algorithm, each agent updates actions based on its own state, the state information received from its neighbors at the current moment and the information of gradient estimation. The performance of the algorithm is measured by the expectation of the dynamic regret, if the time-varying graph is

G

-strongly connected. The result shows that the bound of dynamic regret in expectation grows sublinearly, if the increment rate of minimizer sequence deviation is within a certain range. The simulation experiment in the previous section has verified its effectiveness. By considering practical application, our future work will focus on how to solve the more general pseudoconvex problem.

References

Publishing order | Descend order by publishing year | Descend order by cited within

1	Akbari M, Gharesifard B, Linder T. Distributed online convex optimization on time-varying directed graphs. IEEE Transactions on Control of Network Systems, 2015, 4(3): 417- 428. Cited in this article [3]

2	Li X, Xie L, Hong Y. Distributed continuous time algorithm for a general nonsmooth monotropic optimization problem. International Journal of Robust and Nonlinear Control, 2019, 29(10): 3252- 3266. https://doi.org/10.1002/rnc.4547

3	Deng Z, Chen T. Distributed algorithm design for constrained resource allocation problems with high-order multi-agent systems. Automatica, 2022, 144, 110492. https://doi.org/10.1016/j.automatica.2022.110492

4	Lu K, Zhu Q, Yan X. Distributed ergodic algorithms for mixed equilibrium problems: Absent of cut property. Automatica, 2022, 141, 110297. https://doi.org/10.1016/j.automatica.2022.110297

5	Lei J, Chen H. Distributed stochastic approximation algorithm with expanding truncations. IEEE Transactions on Automatic Control, 2019, 65(2): 664- 679. Cited in this article [1]

6	Lu K, Zhu Q. Distributed algorithms involving fixed step size for mixed equilibrium problems with multiple set constraints. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(11): 5254- 5260. Cited in this article [1]

7	Yang T, Lu J, Wu D, et al. A distributed algorithm for economic dispatch over time-varying directed networks with delays. IEEE Transactions on Industrial Electronics, 2016, 64(6): 5095- 5106. Cited in this article [1]

8	Zhang J, You K, Cai K. Distributed dual gradient tracking for resource allocation in unbalanced networks. IEEE Transactions on Signal Processing, 2020, 68, 2186- 2198. https://doi.org/10.1109/TSP.2020.2981762 Cited in this article [1]

9	Ram S S, Nedić A, Veeravalli V V. Incremental stochastic subgradient algorithms for convex optimization. SIAM Journal on Optimization, 2009, 20(2): 691- 717. https://doi.org/10.1137/080726380 Cited in this article [1]

10	Yuan D, Xu S, Zhao H. Distributed primal-dual subgradient method for multiagent optimization via consensus algorithms. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2011, 41(6): 1715- 1724. https://doi.org/10.1109/TSMCB.2011.2160394 Cited in this article [2]

11	Nedi\'{c} A, Ozdaglar A, Parrilo P A. Constrained consensus and optimization in multi-agent networks. IEEE Transactions on Automatic Control, 2010, 55(4): 922- 938. https://doi.org/10.1109/TAC.2010.2041686 Cited in this article [2]

12	Lu K, Wang L. Online distributed optimization with nonconvex objective functions via dynamic regrets. IEEE Transactions on Automatic Control, 2023. https://doi.org/10.1109/TAC.2023.3239432 Cited in this article [4]

13	Zhu Y, Yu W, Wen G, et al. Continuous-time distributed subgradient algorithm for convex optimization with general constraints. IEEE Transactions on Automatic Control, 2018, 64(4): 1694- 1701. Cited in this article [1]

14	Wang H. Experimental implementation of multi-agent distributed optimization algorithms with shortest distance to convex regions. Riverside University of California, Riverside, 2016. Cited in this article [1]

15	Liu H, Yu W, Chen G. Discrete-time algorithms for distributed constrained convex optimization with linear convergence rates. IEEE Transactions on Cybernetics, 2020, 52(6): 4874- 4885. Cited in this article [1]

16	Chen T, Giannakis G B. Bandit convex optimization for scalable and dynamic IoT management. IEEE Internet of Things Journal, 2018, 6(1): 1276- 1286. Cited in this article [1]

17	Agarwal A, Dekel O, Xiao L. Optimal algorithms for online convex optimization with multi-point bandit feedback. Colt-the Conference on Learning Theory, DBLP, 2010, 28- 40. Cited in this article [1]

18	Duchi J C, Jordan M I, Wainwright M J, et al. Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Transactions on Information Theory, 2015, 61(5): 2788- 2806. https://doi.org/10.1109/TIT.2015.2409256 Cited in this article [3]

19	Nesterov Y, Spokoiny V. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 2017, 17, 527- 566. https://doi.org/10.1007/s10208-015-9296-2 Cited in this article [3]

20	Wang C, Xu S, Yuan D. Distributed online stochastic-constrained convex optimization with bandit feedback. IEEE Transactions on Cybernetics, 2024, 54(1): 63- 75. https://doi.org/10.1109/TCYB.2022.3177644 Cited in this article [1]

21	Sahu A K, Jakovetic D, Bajovic D, et al. Distributed zeroth order optimization over random networks: A Kiefer-Wolfowitz stochastic approximation approach, IEEE Conference on Decision and Control (CDC). IEEE, 2018, 4951- 4958. Cited in this article [1]

22	Pang Y, Hu G. Randomized gradient-free distributed optimization methods for a multiagent system with unknown cost function. IEEE Transactions on Automatic Control, 2019, 65(1): 333- 340. Cited in this article [1]

23	Chen X M, Gao C. Strong consistency of random gradient-free algorithms for distributed optimization. Optimal Control Applications and Methods, 2017, 38(2): 247- 265. https://doi.org/10.1002/oca.2254 Cited in this article [1]

24	Hu X, Wang J. A recurrent neural network for solving a class of general variational inequalities. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2007, 37(3): 528- 539. https://doi.org/10.1109/TSMCB.2006.886166 Cited in this article [1]

25	Konnov I V, Luc D T, Rubinov A M, et al. Some classes of pseudoconvex fractional functions via the Charnes-Cooper transformation. Springer Berlin Heidelberg, 2006. Cited in this article [2]

26	Forgo F, Joó I. Fixed point and equilibrium theorems in pseudoconvex and related spaces. Journal of Global Optimization, 1999, 14, 27- 54. https://doi.org/10.1023/A:1008252724328 Cited in this article [1]

27	Barber J R, Klarbring A. Solid mechanics and its applications. Springer, Berlin, 2003. Cited in this article [1]

28	Lu K, Jing G, Wang L. Online distributed optimization with strongly pseudoconvex-sum cost functions. IEEE Transactions on Automatic Control, 2019, 65(1): 426- 433. Cited in this article [5]

29	Yuan D, Hong Y, Ho D W C, et al. Optimal distributed stochastic mirror descent for strongly convex optimization. Automatica, 2018, 90, 196- 203. https://doi.org/10.1016/j.automatica.2017.12.053 Cited in this article [1]

30	Hall E C, Willett R M. Online convex optimization in dynamic environments. IEEE Journal of Selected Topics in Signal Processing, 2015, 9(4): 647- 662. https://doi.org/10.1109/JSTSP.2015.2404790 Cited in this article [3]

31	Besbes O, Gur Y, Zeevi A. Non-stationary stochastic optimization. Operations Research, 2015, 63(5): 1227- 1244. https://doi.org/10.1287/opre.2015.1408

32	Jadbabaie A, Rakhlin A, Shahrampour S, et al. Online optimization: Competing with dynamic comparators. Artificial Intelligence and Statistics, PMLR, 2015, 398- 406. Cited in this article [4]

33	Blondel V D, Hendrickx J M, Olshevsky A, et al. Convergence in multiagent coordination, consensus, and flocking. Proceedings of the 44th IEEE Conference on Decision and Control, IEEE, 2005, 2996- 3000. Cited in this article [2]

34	Lu K, Xu H. Online distributed optimization with strongly pseudoconvex-sum cost functions and coupled inequality constraints. Automatica, 2023, 156, 111203. https://doi.org/10.1016/j.automatica.2023.111203 Cited in this article [2]

35	Li W, Assaad M. Distributed zeroth-order stochastic optimization in time-varying networks. arXiv preprint arXiv: 2105.12597, 2021. Cited in this article [3]

36	Lu K, Zhu Q, Yan X. Distributed ergodic algorithms for mixed equilibrium problems: Absent of cut property. Automatica, 2022, 141, 110297. https://doi.org/10.1016/j.automatica.2022.110297 Cited in this article [2]

Funding

National Natural Science Foundation of China(62103169)

National Natural Science Foundation of China(51875380)

China Postdoctoral Science Foundation(2021M691313)

PDF(292 KB)

399

Accesses

Citation

Detail

Sections

Recommended

Abstract
Key words
Cite this article
1 Introduction
2 Problem Formulation
2.1 Basic Graph Theory
2.2 Online Distributed Pseudoconvex Optimization
2.3 Online Distributed Algorithms
3 Main Results
4 A Simulation Example
Figure 1 The communication graph
Figure 2 The trajectories of agents' states under algorithm (9)
Figure 3 The trajectory of $R_{i} (t) / t$ , $i = 1, \dots, 6$
5 Conclusions
References
Funding

Received	Accepted	Published
2023-08-02	2023-10-16	2024-02-25
Issue Date
2024-02-28

Please choose a citation manager

Content to export

Abstract

Key words

Cite this article

1 Introduction

2 Problem Formulation

2.1 Basic Graph Theory

2.2 Online Distributed Pseudoconvex Optimization

2.3 Online Distributed Algorithms

3 Main Results

4 A Simulation Example

Figure 1 The communication graph

Figure 2 The trajectories of agents' states under algorithm (9)

Figure 3 The trajectory of $R_{i} (t) / t$ , $i = 1, \dots, 6$

5 Conclusions

{{custom_sec.title}}

{{custom_sec.title}}

References

{{custom_fnGroup.title_en}}

Footnotes

Funding

Share

模态框（Modal）标题

Please choose a citation manager

Content to export

Abstract

Key words

Cite this article

1 Introduction

2 Problem Formulation

2.1 Basic Graph Theory

2.2 Online Distributed Pseudoconvex Optimization

2.3 Online Distributed Algorithms

3 Main Results

4 A Simulation Example

Figure 1 The communication graph

Figure 2 The trajectories of agents' states under algorithm (9)

Figure 3 The trajectory of Ri(t)/t, i=1,⋯,6

5 Conclusions

{{custom_sec.title}}

{{custom_sec.title}}

References

{{custom_fnGroup.title_en}}

Footnotes

Funding

Figure 3 The trajectory of $R_{i} (t) / t$ , $i = 1, \dots, 6$