A Symmetric Linearized Alternating Direction Method of Multipliers for a Class of Stochastic Optimization Problems

Jia HU; Qimin HU

doi:10.21078/JSSI-2023-058-20

PDF(286 KB)

Journal of Systems Science and Information ›› 2023, Vol. 11 ›› Issue (1) : 58-77. DOI: 10.21078/JSSI-2023-058-20

A Symmetric Linearized Alternating Direction Method of Multipliers for a Class of Stochastic Optimization Problems

Author information +

History +

Abstract

Alternating direction method of multipliers (ADMM) receives much attention in the recent years due to various demands from machine learning and big data related optimization. In 2013, Ouyang et al. extend the ADMM to the stochastic setting for solving some stochastic optimization problems, inspired by the structural risk minimization principle. In this paper, we consider a stochastic variant of symmetric ADMM, named symmetric stochastic linearized ADMM (SSL-ADMM). In particular, using the framework of variational inequality, we analyze the convergence properties of SSL-ADMM. Moreover, we show that, with high probability, SSL-ADMM has O((ln N)·N^-1/2) constraint violation bound and objective error bound for convex problems, and has O((ln N)·N^-1/2) constraint violation bound and objective error bound for strongly convex problems, where N is the iteration number. Symmetric ADMM can improve the algorithmic performance compared to classical ADMM, numerical experiments for statistical machine learning show that such an improvement is also present in the stochastic setting.

Key words

alternating direction method of multipliers / stochastic approximation / expected convergence rate and high probability bound / convex optimization / machine learning

Cite this article

EndNote

Ris (Procite)

Bibtex

Download Citations

Jia HU , Qimin HU. A Symmetric Linearized Alternating Direction Method of Multipliers for a Class of Stochastic Optimization Problems. Journal of Systems Science and Information, 2023, 11(1): 58-77 https://doi.org/10.21078/JSSI-2023-058-20

1 Introduction

We consider the following two-block separable convex optimization problem with linear equality constraints:

min {θ_{1} (x) + θ_{2} (y) | A x + B y = b, x \in X, y \in Y},

(1)

where

A \in R^{n \times n_{1}}, B \in R^{n \times n_{2}}, b \in R^{n}, X \subseteq R^{n_{1}}

and

Y \subseteq R^{n_{2}}

are closed convex sets, and

θ_{1} : R^{n_{1}} \to R

and

θ_{2} : R^{n_{2}} \to R

are convex functions (not necessarily smooth), while

θ_{1}

has its specific structure. In particular, we assume that there is a stochastic first-order oracle (

S F O

) for

θ_{1}

, which returns an unbiased and bounded stochastic gradient

G (x, ξ)

x

, where

ξ

is a random variable whose distribution is supported on

Ξ \subseteq R^{d}

. Such an assumption is common in the stochastic programming (SP), see, e.g., [1-3] and the references therein. In SP, the objective function is often in the form of expectation, i.e.,

θ_{1} (x) = \int_{Ξ} Θ (x, ξ) d P (ξ)

for some

Θ, P

, and

Ξ

, including finite sum as a special case. For both cases (number of terms in the summation is large for the latter case), getting the full function value or gradient information is impractical. Motivated by this, we need to design some stochastic approximation^[4] based algorithms to solve problem (1).

For the problem (1) itself, as a linearly constrained convex optimization problem, it is rich enough to characterize many optimization problems arising from various application fields, such as machine learning, image processing, and signal processing. In these fields, a typical scenario is where one of the functions represents some data fidelity term, and the other is a regularization term, see, e.g., [5] and the references therein. Without considering the specific structure, i.e., the assumption of

S F O

is not needed in the model, a classical method for solving problem (1) is the alternating direction method of multipliers (ADMM). ADMM was originally proposed by Glowinski and Marrocco^[6], and Gabay and Mercier^[7], which is a Gauss-Seidel implementation of augmented Lagrangian method^[8] or an application of Douglas-Rachford splitting method on the dual problem of (1)^[9]. For both convex and non-convex problems, there are extensive studies on the theoretical properties of ADMM. In particular, for convex optimization problems, theoretical results on convergence behavior are abundant, whether global convergence, sublinear convergence rate, or linear convergence rate, see, e.g., [9-15]. Recently, ADMM has been studied on nonconvex models satisfying the well-known Kurdyka-Lojasiewicz (KL) inequality or other similar properties, see, e.g., [16-19]. For a thorough understanding on some recent developments of ADMM, one can refer to a survey^[20].

However, as we mentioned before, the gradient information of

θ_{1}

in (1) must be obtained by the

S F O

due to some computational or other limitation, and hence aforementioned ADMM does not work. To tackle this problem, some stochastic ADMM type algorithms have been proposed recently, see, e.g., [21-24]. Note that in these works, only the basic iterative scheme of ADMM was considered. It is well-known that symmetrically updating the dual variable in a more flexible way often improves the algorithmic performance, which is the idea of symmetric ADMM (or Peaceman-Rachford splitting method applied to the dual of problem (1)), see, e.g., [25-28]. In this paper, we study symmetric ADMM in the stochastic setting. In particular, we propose a symmetric stochastic linearized ADMM (SSL-ADMM) for solving two-block separable stochastic optimization problem (1) and analyze corresponding worst-case convergence rate by means of the framework of variational inequality. Moreover, we establish the large-deviation properties of SSL-ADMM under certain light-tail assumptions. Also, numerical experiments on the graph-guided fused lasso problem demonstrate the promising performance compared to non-symmetric ADMM.

The rest of this paper is organized as follows. We introduce some fundamental preliminaries in Section 2. Convergence properties of the proposed algorithm are analyzed in Section 3. The high probability guarantees for objective error and constraint violation of the proposed algorithm are investigated in Section 4. In Section 5, numerical results are presented to indicate the promising efficiency of symmetrically updating dual variables in the stochastic setting. Finally, a summary is made in Section 6.

Notations For two matrices

A

and

B

, the ordering relation

A ≻ B

(

A ⪰ B

) means

A - B

is positive definite (semidefinite).

I_{m}

denotes the

m \times m

identity matrix. For a vector

x

∥ x ∥

denotes its Euclidean norm; for a matrix

X

∥ X ∥

denotes its spectral norm. For any symmetric matrix

G

, define

{∥ x ∥}_{G}^{2} := x^{T} G x

and

{∥ x ∥}_{G} := \sqrt{x^{T} G x}

G ⪰ 0

E [\cdot]

denotes the mathematical expectation of a random variable.

P r {\cdot}

denotes the probability value of an event.

\partial

and

\nabla

denote the subdifferential and gradient operator of a function, respectively. We also sometimes use

(x, y)

and

(x, y, λ)

to denote the vectors

{(x^{T}, y^{T})}^{T}

and

{(x^{T}, y^{T}, λ^{T})}^{T}

, respectively.

2 Preliminaries

In this section, we summarize some preliminaries that will be used in later analysis. Let the Lagrangian function of the problem (1) be

L (x, y, λ) = θ_{1} (x) + θ_{2} (y) - λ^{T} (A x + B y - b),

defined on

X \times Y \times R^{n}

. We call

(x^{*}, y^{*}, λ^{*})

a saddle point of

L (x, y, λ) \in X \times Y \times R^{n}

if the following inequalities are satisfied:

L_{λ \in R^{n}} (x^{*}, y^{*}, λ) \leq L (x^{*}, y^{*}, λ^{*}) \leq L_{x \in X, y \in Y} (x, y, λ^{*}) .

Obviously, a saddle point

(x^{*}, y^{*}, λ^{*})

can be characterized by the following inequalities

{\begin{cases} x^{*} \in X, L (x, y^{*}, λ^{*}) - L (x^{*}, y^{*}, λ^{*}) \geq 0, \forall x \in X, \\ y^{*} \in Y, L (x^{*}, y, λ^{*}) - L (x^{*}, y^{*}, λ^{*}) \geq 0, \forall y \in Y, \\ λ^{*} \in R^{n}, L (x^{*}, y^{*}, λ^{*}) - L (x^{*}, y^{*}, λ) \geq 0, \forall λ \in R^{n} . \end{cases}

Below we invoke two propositions, one of which characterizes the optimality condition of an optimization model by a variational inequality and the other gives a result for the martingale-difference sequence.

Proposition 1 Let $X \subset R^{n}$ be a closed convex set and let $θ (x) : R^{n} \to R$ and $f (x) : R^{n} \to R$ be convex functions. In addition, $f (x)$ is differentiable. Assuming that the solution set of the minimization problem $min {θ (x) + f (x) | x \in X}$ is nonempty, then we have the assertion that

x^{*} = \arg min {θ (x) + f (x) | x \in X}

if and only if

x^{*} \in X, θ (x) - θ (x^{*}) + {(x - x^{*})}^{T} \nabla f (x^{*}) \geq 0, \forall x \in X .

Proof The proof can be found in [31].

Proposition 2 Let $ξ_{[t]} \equiv {ξ_{1}, ξ_{2}, \dots, ξ_{t}}$ be a sequence of independent identically distributed random variables, and $ζ_{t} = ζ_{t} (ξ_{[t]})$ be deterministic Borel functions of $ξ_{[t]}$ such that $E_{| ξ_{[t - 1]}} [ζ_{t}] = 0$ almost surely and $E_{∣ ξ_{[t - 1]}} [\exp {ζ_{t}^{2} / σ_{t}^{2}}] \leq \exp {1}$ almost surely, where $σ_{t} > 0$ are deterministic and $E_{| Y} [X]$ denotes the expectation of random variable $X$ conditional on random variable $Y$ . Then

\forall λ \geq 0 : Prob {\sum_{t = 1}^{N} ζ_{t} > λ \sqrt{\sum_{t = 1}^{N} σ_{t}^{2}}} \leq \exp {- λ^{2} / 3} .

Proof The proof can be founded in Lemma 4.1 on page 116–117 of [32].

Hence using Proposition 1, under the solution set of problem (1) is nonempty, solving (1) is equivalent to solving the following variational inequality problem: Finding

w^{*} = (x^{*}, y^{*}, λ^{*}) \in Ω := X \times Y \times R^{n}

such that

θ (u) - θ (u^{*}) + {(w - w^{*})}^{T} F (w^{*}) \geq 0, \forall w \in Ω,

where

\begin{aligned} u = (\begin{array}{c} x \\ y \end{array}), w = (\begin{array}{c} x \\ y \\ λ \end{array}), F (w) = (\begin{array}{c} - A^{T} λ \\ - B^{T} λ \\ A x + B y - b \end{array}), a n d θ (u) = θ_{1} (x) + θ_{2} (y) . \end{aligned}

The variables with superscript or subscript such as

u^{k}, w^{k}, {\bar{u}}_{k}, {\bar{w}}_{k}

are denoted similarly. In addition, we define two auxiliary sequences for the convergence analysis. More specifically, for the sequence

{w^{k}}

generated by the SSL-ADMM in Section 3, let

\begin{aligned} {\tilde{w}}^{k} = (\begin{array}{c} {\tilde{x}}^{k} \\ {\tilde{y}}^{k} \\ {\tilde{λ}}^{k} \end{array}) = (\begin{array}{c} x^{k + 1} \\ y^{k + 1} \\ λ^{k} - β (A x^{k + 1} + B y^{k} - b) \end{array}) a n d {\tilde{u}}^{k} = (\begin{array}{c} {\tilde{x}}^{k} \\ {\tilde{y}}^{k} \end{array}) . \end{aligned}

(2)

Throughout the paper, we need the following assumptions:

Assumption

(i)

The primal-dual solution set

Ω^{*}

of problem (1) is nonempty.

(i i)

θ_{1} (x)

is differentiable, and its gradient satisfies the

L

-Lipschitz condition

∥ \nabla θ_{1} (x_{1}) - \nabla θ_{1} (x_{2}) ∥ \leq L ∥ x_{1} - x_{2} ∥

for all

x_{1}, x_{2} \in X

(i i i)

\begin{aligned} a) & E [G (x, ξ)] = \nabla θ_{1} (x) and b) & E [{∥ G (x, ξ) - \nabla θ_{1} (x) ∥}^{2}] \leq σ^{2}, \end{aligned}

where

σ > 0

is some constant.

Under the second assumption, it holds that for all

x, y \in X

θ_{1} (x) \leq θ_{1} (y) + {(x - y)}^{T} \nabla θ_{1} (y) + \frac{L}{2} {∥ x - y ∥}^{2} .

A direct result of combining this property with convexity is shown in the following lemma.

Lemma 1 Suppose function $f$ is convex and differentiable, and its gradient is $L$ -Lipschitz continuous, then for any $x, y, z$ we have

{(x - y)}^{T} \nabla f (z) \leq f (x) - f (y) + \frac{L}{2} {∥ y - z ∥}^{2} .

In addition, if $f$ is $μ$ -strongly convex, then for any $x, y, z$ we have

{(x - y)}^{T} \nabla f (z) \leq f (x) - f (y) + \frac{L}{2} {∥ y - z ∥}^{2} - \frac{μ}{2} {∥ x - z ∥}^{2} .

Proof Since the gradient of

f

L

-Lipschitz continuous, then for any

y, z

we have

f (y) \leq f (z) + {(y - z)}^{T} \nabla f (z) + \frac{L}{2} {∥ y - z ∥}^{2} .

Also, due to the convexity of

f

, we have for any

x, z

f (x) \geq f (z) + {(x - z)}^{T} \nabla f (z) .

Adding the above two inequalities, we get the conclusion. If

f

μ

-strongly convex, then for any

x, z

f (x) \geq f (z) + {(x - z)}^{T} \nabla f (z) + \frac{μ}{2} {∥ x - z ∥}^{2} .

Then combine this inequality with

f (y) \leq f (z) + {(y - z)}^{T} \nabla f (z) + \frac{L}{2} {∥ y - z ∥}^{2},

and the proof is completed.

3 Symmetric Stochastic Linearized ADMM

In this section, we will present and analyze iterative scheme of the proposed symmetric stochastic linearized ADMM, named SSL-ADMM.

Algorithm 1 is called SSL-ADMM for short. We give some remarks on this algorithm. SSL-ADMM is a ADMM type algorithm, which alternates through one

x

-subproblem, an update on the multipliers, one

y

-subproblem, and an update on the multipliers again. The algorithm is symmetric since the dual variable is symmetrically updated twice at each iteration. The algorithm is stochastic since at each iteration

S F O

is called to obtain a stochastic gradient

G (x^{k}, ξ)

which is an unbiased estimation of

g (x^{k})

, the gradient of

θ_{1} (x)

x^{k}

, and is bounded relative to

g (x^{k})

in expectation. The algorithm is linearized due to the following two aspects: (i) The term

G (x^{k}, ξ)^{T} (x - x^{k})

in the

x

-subproblem of SSL-ADMM is a stochastic version of linearization of

θ_{1} (x^{k})

. (ii)

x

-subproblem and

y

-subproblem are added proximal terms

\frac{1}{2} ∥ x - x^{k} ∥_{G_{1, k}}^{2}

and

\frac{1}{2} ∥ y - y^{k} ∥_{G_{2, k}}^{2}

respectively, where

{G_{1, k}} and {G_{2, k}}

are two sequences of symmetric and positive definite matrices that can be change with iteration; with the choice of

G_{2, k} \equiv τ I_{n_{2}} - β B^{T} B, τ > β ∥ B^{T} B ∥

, the quadratic term in the

y

-subproblem is linearized. The same fact applies to the

x

-subproblem. Furthermore, when

G_{1, k} \equiv I_{n_{1}}

or is of the form

τ I_{n_{1}} - β A^{T} A, τ > 0

, the term

\frac{1}{2} ∥ y - y^{k} ∥_{G_{2, k}}^{2}

vanishes, and

r = 0

, SSL-ADMM reduces to the algorithm appeared in earlier literatures [21, 24]. Finally, the convergence region

D

(r, s)

is the same as that in [27]. In particular, if

r = 0

s \in (0, \frac{1 + \sqrt{5}}{2}]

. Recently, Bai, et al.^{[29, 30]} studied stochastic ADMM algorithms for finite sum optimization problems. The difference between our paper and theirs is that our goal is to consider a more general stochastic optimization problem (1) where the random variable

ξ

is not necessarily a discrete random variable. For example, in SP, while it is possible to approximate the function

θ_{1}

by the sample average approximation technique, the extra sample approximation error should also be taken into account.

Algorithm 1: Symmetric Stochastic Linearized ADMM (SSL-ADMM)
Initialize $x^{0} \in X, y^{0} \in Y, λ^{0}, β, (r, s) \in D$ , two sequences of symmetric and positive semidefinite matrices: ${G_{1, k}}$ and ${G_{2, k}}$ , where
$D = {(r, s) \| r + s > 0, r \leq 1, - r^{2} - s^{2} - r s + r + s + 1 \geq 0},$
for $k = 0, 1, \dots$ .
Call the $S F O$ to obtain $G (x^{k}, ξ)$ ;
$x^{k + 1} = \underset{x \in X}{\arg min} {G {(x^{k}, ξ)}^{T} (x - x^{k}) - x^{T} A^{T} λ^{k} + \frac{β}{2} {∥ A x + B y^{k} - b ∥}^{2}$
$+ \frac{1}{2} {∥ x - x^{k} ∥}_{G_{1, k}}^{2}}$ ;
$λ^{k + \frac{1}{2}} = λ^{k} - r β (A x^{k + 1} + B y^{k} - b)$ ;
$y^{k + 1} = \underset{y \in Y}{\arg min} {θ_{2} (y) - y^{T} B^{T} λ^{k + \frac{1}{2}} + \frac{β}{2} {∥ A x^{k + 1} + B y - b ∥}^{2}$
$+ \frac{1}{2} {∥ y - y^{k} ∥}_{G_{2, k}}^{2}}$ ;
$λ^{k + 1} = λ^{k + \frac{1}{2}} - s β (A x^{k + 1} + B y^{k + 1} - b)$ .
end

We start to establish the convergence of SSL-ADMM. The next several lemmas are to obtain an upper bound of

θ ({\tilde{u}}^{k}) - θ (u) + ({\tilde{w}}^{k} - w)^{T} F ({\tilde{w}}^{k})

. With such a bound, it is possible to estimate the worst-case convergence rate of SSL-ADMM.

Lemma 2 Let the sequence ${w^{k}}$ be generated by the SSL-ADMM and the associated ${{\tilde{w}}^{k}}$ be defined in (2). Then we have

\begin{aligned} θ (u) - θ ({\tilde{u}}^{k}) + {(w - {\tilde{w}}^{k})}^{T} F ({\tilde{w}}^{k}) \geq & {(w - {\tilde{w}}^{k})}^{T} Q_{k} (w^{k} - {\tilde{w}}^{k}) - {(x - {\tilde{x}}^{k})}^{T} δ^{k} \\ - \frac{L}{2} {∥ x^{k} - {\tilde{x}}^{k} ∥}^{2}, \forall w \in Ω, \end{aligned}

(3)

where $δ^{k} = G (x^{k}, ξ) - \nabla θ_{1} (x^{k})$ , similarly hereinafter, and

Q_{k} = (\begin{matrix} G_{1, k} & 0 & 0 \\ 0 & β B^{T} B + G_{2, k} & - r B^{T} \\ 0 & - B & \frac{1}{β} I_{n} \end{matrix}) .

(4)

Proof Due to Proposition 1, the optimality condition of the

x

-subproblem in SSL-ADMM is

\begin{aligned} {(x - x^{k + 1})}^{T} (G (x^{k}, ξ) - A^{T} (λ^{k} - β (A x^{k + 1} + B y^{k} - b)) + G_{1, k} (x^{k + 1} - x^{k})) \geq 0, \\ \forall x \in X . \end{aligned}

Using the notation in (2), the above inequality can be rewritten as

\begin{aligned} {(x - {\tilde{x}}^{k})}^{T} (\nabla θ_{1} (x^{k}) + δ^{k} - A^{T} {\tilde{λ}}^{k} + G_{1, k} ({\tilde{x}}^{k} - x^{k})) \geq 0, \forall x \in X . \end{aligned}

And then using Lemma 1, we have

\begin{aligned} θ_{1} (x) - θ_{1} ({\tilde{x}}^{k}) + {(x - {\tilde{x}}^{k})}^{T} (- A^{T} {\tilde{λ}}^{k}) \geq & {(x - {\tilde{x}}^{k})}^{T} G_{1, k} (x^{k} - {\tilde{x}}^{k}) - {(x - {\tilde{x}}^{k})}^{T} δ^{k} \\ - \frac{L}{2} {∥ x^{k} - {\tilde{x}}^{k} ∥}^{2}, \forall x \in X . \end{aligned}

(5)

Similarly, the optimality condition of

y

-subproblem in SSL-ADMM is

\begin{aligned} θ_{2} (y) - θ_{2} (y^{k + 1}) + {(y - y^{k + 1})}^{T} (- B^{T} λ^{k + \frac{1}{2}} + β B^{T} (A x^{k + 1} + B y^{k + 1} - b) \\ + G_{2, k} (y^{k + 1} - y^{k})) \geq 0, \forall y \in Y . \end{aligned}

(6)

Using the notation of

{\tilde{λ}}^{k}

λ^{k + \frac{1}{2}} = λ^{k} - r (λ^{k} - {\tilde{λ}}^{k}) = {\tilde{λ}}^{k} + (1 - r) (λ^{k} - {\tilde{λ}}^{k})

. Hence

\begin{aligned} - B^{T} λ^{k + \frac{1}{2}} + β B^{T} (A x^{k + 1} + B y^{k + 1} - b) \\ = & - B^{T} ({\tilde{λ}}^{k} + (1 - r) (λ^{k} - {\tilde{λ}}^{k})) + B^{T} (λ^{k} - {\tilde{λ}}^{k}) + β B^{T} B ({\tilde{y}}^{k} - y^{k}) \\ = & - B^{T} {\tilde{λ}}^{k} + r B^{T} (λ^{k} - {\tilde{λ}}^{k}) + β B^{T} B ({\tilde{y}}^{k} - y^{k}) . \end{aligned}

Substituting this equality into (6), we obtain

\begin{aligned} θ_{2} (y) - θ_{2} ({\tilde{y}}^{k}) + {(y - {\tilde{y}}^{k})}^{T} (- B^{T} {\tilde{λ}}^{k}) \geq & {(y - {\tilde{y}}^{k})}^{T} (β B^{T} B + G_{2, k}) (y^{k} - {\tilde{y}}^{k}) \\ - r {(y - {\tilde{y}}^{k})}^{T} B^{T} (λ^{k} - {\tilde{λ}}^{k}) . \end{aligned}

(7)

According to the definition of

{\tilde{w}}^{k}

, we have

(A {\tilde{x}}^{k} + B {\tilde{y}}^{k} - b) - B ({\tilde{y}}^{k} - y^{k}) + \frac{1}{β} ({\tilde{λ}}^{k} - λ^{k}) = 0,

and it can be written as

{(λ - {\tilde{λ}}^{k})}^{T} {(A {\tilde{x}}^{k} + B {\tilde{y}}^{k} - b) - B ({\tilde{y}}^{k} - y^{k}) + \frac{1}{β} ({\tilde{λ}}^{k} - λ^{k})} \geq 0, \forall λ \in R^{n} .

(8)

Combining (5), (7), and (8), and using the notation of (4), the proof is completed.

Lemma 3 Let the sequence ${w^{k}}$ be generated by the SSL-ADMM and the associated ${{\tilde{w}}^{k}}$ be defined in (2). Then we have

w^{k + 1} = w^{k} - M (w^{k} - {\tilde{w}}^{k}),

(9)

where

M = (\begin{matrix} I_{n_{1}} & 0 & 0 \\ 0 & I_{n_{2}} & 0 \\ 0 & - s β B & (r + s) I_{n} \end{matrix}) .

(10)

Proof

\begin{aligned} λ^{k + 1} = & λ^{k + \frac{1}{2}} - s β (A x^{k + 1} + B y^{k + 1} - b) \\ = & λ^{k} - r (λ^{k} - {\tilde{λ}}^{k}) - s (β (A x^{k + 1} + B y^{k} - b) - β B (y^{k} - y^{k + 1})) \\ = & λ^{k} - (r + s) (λ^{k} - {\tilde{λ}}^{k}) + s β B (y^{k} - {\tilde{y}}^{k}) . \end{aligned}

Together with

x^{k + 1} = \tilde{x}

and

y^{k + 1} = \tilde{y}

, we prove the assertion of this lemma.

Noting that for the matrices

Q_{k}

defined in (4) and

M

defined in (10), there is a matrix

H_{k}

such that

Q_{k} = H_{k} M

, where

H_{k} = (\begin{matrix} G_{1, k} & 0 & 0 \\ 0 & (1 - \frac{r s}{r + s}) β B^{T} B + G_{2, k} & - \frac{r}{r + s} B^{T} \\ 0 & - \frac{r}{r + s} B & \frac{1}{β (r + s)} I_{n} \end{matrix}) .

(11)

It is easy to check for any

(r, s) \in D

H_{k}

is positive semidefinite when the matrix

B

is full column rank. In fact, to make

H_{k}

positive semidefinite alone, it is sufficient for

G_{2, k} ⪰ (r - 1) β B^{T} B

when other conditions are satisfied.

Lemma 4 Let the sequence ${w^{k}}$ be generated by the SSL-ADMM and the associated ${{\tilde{w}}^{k}}$ be defined in (2). Then we have

\begin{aligned} {(w - {\tilde{w}}^{k})}^{T} Q_{k} (w^{k} - {\tilde{w}}^{k}) \\ = & \frac{1}{2} ({∥ w - w^{k + 1} ∥}_{H_{k}}^{2} - {∥ w - w^{k} ∥}_{H_{k}}^{2}) + \frac{1}{2} {(w^{k} - {\tilde{w}}^{k})}^{T} G (w^{k} - {\tilde{w}}^{k}), \end{aligned}

(12)

where $G := Q_{k} + Q_{k}^{T} - M^{T} H_{k} M$ .

Proof Using

Q_{k} = H_{k} M

and

w^{k + 1} = w^{k} - M (w^{k} - {\tilde{w}}^{k})

, we have

(w - {\tilde{w}}^{k})^{T} Q_{k} (w^{k} - {\tilde{w}}^{k}) = (w - {\tilde{w}}^{k})^{T} H_{k} (w^{k} - w^{k + 1})

. Applying the identity

(a - b)^{T} H (c - d) = \frac{1}{2} (∥ a - d ∥_{H}^{2} - ∥ a - c ∥_{H}^{2}) + \frac{1}{2} (∥ c - b ∥_{H}^{2} - ∥ d - b ∥_{H}^{2})

, we obtain

\begin{aligned} {(w - {\tilde{w}}^{k})}^{T} H_{k} (w^{k} - w^{k + 1}) \\ = & \frac{1}{2} ({∥ w - w^{k + 1} ∥}_{H_{k}}^{2} - {∥ w - w^{k} ∥}_{H_{k}}^{2}) + \frac{1}{2} ({∥ w^{k} - {\tilde{w}}^{k} ∥}_{H_{k}}^{2} - {∥ w^{k + 1} - {\tilde{w}}^{k} ∥}_{H_{k}}^{2}) . \end{aligned}

The remaining task is to simplify the last two terms.

\begin{aligned} {∥ w^{k} - {\tilde{w}}^{k} ∥}_{H_{k}}^{2} - {∥ w^{k + 1} - {\tilde{w}}^{k} ∥}_{H_{k}}^{2} \\ = & {∥ w^{k} - {\tilde{w}}^{k} ∥}_{H_{k}}^{2} - {∥ w^{k + 1} - w^{k} + w^{k} - {\tilde{w}}^{k} ∥}_{H_{k}}^{2} \\ = & {∥ w^{k} - {\tilde{w}}^{k} ∥}_{H_{k}}^{2} - {∥ (I_{n_{1} + n_{2} + n} - M) (w^{k} - {\tilde{w}}^{k}) ∥}_{H_{k}}^{2} \\ = & {(w^{k} - {\tilde{w}}^{k})}^{T} (H_{k} - {(I_{n_{1} + n_{2} + n} - M)}^{T} H_{k} (I_{n_{1} + n_{2} + n} - M)) (w^{k} - {\tilde{w}}^{k}) \\ = & {(w^{k} - {\tilde{w}}^{k})}^{T} (Q_{k} + Q_{k}^{T} - M^{T} H_{k} M) (w^{k} - {\tilde{w}}^{k}) . \end{aligned}

The proof is completed.

From this lemma,

(w - {\tilde{w}}^{k})^{T} Q_{k} (w^{k} - {\tilde{w}}^{k})

can be written as two terms, {one of which is suitable for recursive operation and the other is a quadratic term}, but the matrix

G

is not necessarily semidefinite. Thus we need to analyze this quadratic term in detail. Since

\begin{aligned} G = (\begin{array}{c} G_{1, k} & 0 & 0 \\ 0 & (1 - s) β B^{T} B + G_{2, k} & (s - 1) B^{T} \\ 0 & (s - 1) B & \frac{2 - r - s}{β} I_{n} \end{array}), \end{aligned}

\begin{aligned} {(w^{k} - {\tilde{w}}^{k})}^{T} G (w^{k} - {\tilde{w}}^{k}) \\ = & {∥ x^{k} - x^{k + 1} ∥}_{G_{1, k}}^{2} + {∥ y^{k} - y^{k + 1} ∥}_{G_{2, k}}^{2} + (1 - s) β {∥ B (y^{k} - y^{k + 1}) ∥}^{2} \\ + \frac{2 - r - s}{β} {∥ λ^{k} - {\tilde{λ}}^{k} ∥}^{2} + 2 (s - 1) {(λ^{k} - {\tilde{λ}}^{k})}^{T} B (y^{k} - y^{k + 1}) . \end{aligned}

As the following lemma shows, the last two terms of the above equality can be further analyzed.

Lemma 5 Assume that $(r, s) \in D$ . Let the sequence ${w^{k}}$ be generated by the SSL-ADMM and the associated ${{\tilde{w}}^{k}}$ be defined in (2). Then we have

\begin{aligned} \frac{2 - r - s}{β} {∥ λ^{k} - {\tilde{λ}}^{k} ∥}^{2} + 2 (s - 1) {(λ^{k} - {\tilde{λ}}^{k})}^{T} B (y^{k} - y^{k + 1}) \\ \geq & \frac{2 (1 - r) (1 - s)}{1 + r} β {(A x^{k} + B y^{k} - b)}^{T} B (y^{k} - y^{k + 1}) \\ + (s - r - \frac{2 r (1 - r)}{1 + r}) β {∥ B (y^{k} - y^{k + 1}) ∥}^{2} + (2 - r - s) β {∥ A x^{k + 1} + B y^{k + 1} - b ∥}^{2} \\ + \frac{2 (1 - r) β}{1 + r} ({∥ y^{k + 1} - y^{k} ∥}_{G_{2, k}}^{2} - \frac{1}{2} {∥ y^{k} - y^{k - 1} ∥}_{G_{2, k - 1}}^{2} - \frac{1}{2} {∥ y^{k + 1} - y^{k} ∥}_{G_{2, k - 1}}^{2}) . \end{aligned}

(13)

Proof It follows from the optimality condition of

y

-subproblem for

(k + 1)

-th iteration that

\begin{aligned} θ_{2} (y^{k}) - θ_{2} (y^{k + 1}) + {(y^{k} - y^{k + 1})}^{T} (- B^{T} λ^{k + \frac{1}{2}} + β B^{T} (A x^{k + 1} + B y^{k + 1} - b) \\ + G_{2, k} (y^{k + 1} - y^{k})) \geq 0. \end{aligned}

Similarly, it follows from the optimality condition of

y

-subproblem for

k

-th iteration that

\begin{aligned} θ_{2} (y^{k + 1}) - θ_{2} (y^{k}) + {(y^{k + 1} - y^{k})}^{T} (- B^{T} λ^{k - \frac{1}{2}} + β B^{T} (A x^{k} + B y^{k} - b) \\ + G_{2, k - 1} (y^{k} - y^{k - 1})) \geq 0. \end{aligned}

Adding these two inequalities and using

λ^{k - \frac{1}{2}} - λ^{k + \frac{1}{2}} = r β (A x^{k + 1} + B y^{k + 1} - b) + s β (A x^{k} + B y^{k} - b) + r β B (y^{k} - y^{k + 1}),

we obtain

\begin{aligned} {(r + 1) β (A x^{k + 1} + B y^{k + 1} - b) + (s - 1) β (A x^{k} + B y^{k} - b) + r β B (y^{k} - y^{k + 1})}^{T} \\ B (y^{k} - y^{k + 1}) \geq {(y^{k} - y^{k + 1})}^{T} (G_{2, k - 1} (y^{k} - y^{k - 1}) - G_{2, k} (y^{k + 1} - y^{k})) . \end{aligned}

This implies

\begin{aligned} {(A x^{k + 1} + B y^{k + 1} - b)}^{T} B (y^{k} - y^{k + 1}) \\ \geq & \frac{1 - s}{1 + r} {(A x^{k} + B y^{k} - b)}^{T} B (y^{k} - y^{k + 1}) - \frac{r}{1 + r} {∥ B (y^{k} - y^{k + 1}) ∥}^{2} \\ + \frac{β}{1 + r} ({∥ y^{k + 1} - y^{k} ∥}_{G_{2, k}}^{2} - \frac{1}{2} {∥ y^{k} - y^{k - 1} ∥}_{G_{2, k - 1}}^{2} - \frac{1}{2} {∥ y^{k + 1} - y^{k} ∥}_{G_{2, k - 1}}^{2}) . \end{aligned}

(14)

On the other hand, we have

\begin{aligned} 2 (s - 1) {(λ^{k} - {\tilde{λ}}^{k})}^{T} B (y^{k} - y^{k + 1}) \\ = & 2 (s - 1) β {(A x^{k + 1} + B y^{k} - b)}^{T} B (y^{k} - y^{k + 1}) \\ = & 2 (s - 1) {β {(A x^{k + 1} + B y^{k + 1} - b)}^{T} B (y^{k} - y^{k + 1}) + β {∥ B (y^{k} - y^{k + 1}) ∥}^{2}} \end{aligned}

(15)

and

\begin{aligned} \frac{2 - r - s}{β} {∥ λ^{k} - {\tilde{λ}}^{k} ∥}^{2} \\ = & (2 - r - s) β {∥ A x^{k + 1} + B y^{k} - b ∥}^{2} \\ = & (2 - r - s) β {∥ (A x^{k + 1} + B y^{k + 1} - b) + B (y^{k} - y^{k + 1}) ∥}^{2} \\ = & (2 - r - s) β {∥ A x^{k + 1} + B y^{k + 1} - b ∥}^{2} + (2 - r - s) β {∥ B (y^{k} - y^{k + 1}) ∥}^{2} \\ + 2 (2 - r - s) β {(A x^{k + 1} + B y^{k + 1} - b)}^{T} B (y^{k} - y^{k + 1}) . \end{aligned}

(16)

Combining (14), (15), and (16), we get the assertion of this lemma.

According to this lemma and using Cauchy-Schwarz inequality, the term

(w^{k} - {\tilde{w}}^{k})^{T} G (w^{k} - {\tilde{w}}^{k})

can be bounded as follows:

\begin{aligned} {(w^{k} - {\tilde{w}}^{k})}^{T} G (w^{k} - {\tilde{w}}^{k}) \\ \geq & (2 - r - s - \frac{{(1 - s)}^{2}}{1 + r}) β {∥ A x^{k + 1} + B y^{k + 1} - b ∥}^{2} + {∥ x^{k} - x^{k + 1} ∥}_{G_{1, k}}^{2} + {∥ y^{k} - y^{k + 1} ∥}_{G_{2, k}}^{2} \\ + \frac{2 (1 - r) β}{1 + r} ({∥ y^{k + 1} - y^{k} ∥}_{G_{2, k}}^{2} - \frac{1}{2} {∥ y^{k} - y^{k - 1} ∥}_{G_{2, k - 1}}^{2} - \frac{1}{2} {∥ y^{k + 1} - y^{k} ∥}_{G_{2, k - 1}}^{2}) \\ + \frac{{(1 - s)}^{2}}{1 + r} β ({∥ A x^{k + 1} + B y^{k + 1} - b ∥}^{2} - {∥ A x^{k} + B y^{k} - b ∥}^{2}) . \end{aligned}

(17)

Now combining (17), Lemma 2, and Lemma 4, we obtain the following main theorem. In this theorem, we take

G_{1, k}

of the form

τ_{k} I_{n_{1}} - β A^{T} A, τ_{k} > 0

, which simplifies the system of linear equation in

x

-subproblem, and

G_{2, k} \equiv G_{2}

. Of course,

G_{2}

can also take the similar form as

G_{1, k}

. In particular, if

G_{2} = η I_{n_{2}} - β B^{T} B, η \geq β ∥ B^{T} B ∥

, then

y

-subproblem reduces to the proximal mapping of

g

Theorem 1 Assume that $(r, s) \in D$ . Let the sequence ${w^{k}}$ be generated by the SSL-ADMM and the associated ${{\tilde{w}}^{k}}$ be defined in (2), and

{\bar{w}}_{N} = \frac{1}{N} \sum_{t = 1}^{N} {\tilde{w}}^{t}

for some pre-selected integer $N$ . Choosing $τ_{k} \equiv \sqrt{N} + M$ , where $M$ is a constant satisfying the ordering relation $M I_{n_{1}} ⪰ L I_{n_{1}} + β A^{T} A$ , then we have

\begin{aligned} θ ({\bar{u}}_{N}) - θ (u) + {({\bar{w}}_{N} - w)}^{T} F (w) \\ \leq & \frac{1}{2 N} {∥ w^{1} - w ∥}_{H_{1}}^{2} + \frac{(1 - r) β}{2 (1 + r) N} {∥ y^{1} - y^{0} ∥}_{G_{2}}^{2} + \frac{{(1 - s)}^{2} β}{2 (1 + r) N} {∥ A x^{1} + B y^{1} - b ∥}^{2} \\ + \frac{1}{N} \sum_{t = 1}^{N} {(x - x^{t})}^{T} δ^{t} + \frac{1}{2 N \sqrt{N}} \sum_{t = 1}^{N} {∥ δ^{t} ∥}^{2} . \end{aligned}

(18)

Proof It is sufficient for using convexity of

θ

(x^{k} - x^{k + 1})^{T} δ^{k} \leq \frac{\sqrt{N}}{2} ∥ x^{k} - x^{k + 1} ∥^{2} + \frac{1}{2 \sqrt{N}} ∥ δ^{k} ∥^{2}

, and

({\tilde{w}}^{k} - w)^{T} F (w) = ({\tilde{w}}^{k} - w)^{T} F ({\tilde{w}}^{k})

Corollary 1 Assume that all the conditions in Theorem 1 hold, then SSL-ADMM has the following properties

(i)

\begin{aligned} E [∥ A {\bar{x}}_{N} + B {\bar{y}}_{N} - b ∥] \\ \leq & \frac{1}{2 N} {∥ w^{1} - (x^{*}, y^{*}, λ^{*} + e) ∥}_{H_{1}}^{2} + \frac{(1 - r) β}{2 (1 + r) N} {∥ y^{1} - y^{0} ∥}_{G_{2}}^{2} \\ + \frac{{(1 - s)}^{2} β}{2 (1 + r) N} {∥ A x^{1} + B y^{1} - b ∥}^{2} + \frac{σ^{2}}{2 \sqrt{N}}, \end{aligned}

(19)

(i i)

\begin{aligned} E [θ ({\bar{u}}_{N}) - θ (u^{*})] \\ \leq & (∥ λ^{*} ∥ + 1) (\frac{1}{2 N} {∥ w^{1} - (x^{*}, y^{*}, λ^{*} + e) ∥}_{H_{1}}^{2} + \frac{(1 - r) β}{2 (1 + r) N} {∥ y^{1} - y^{0} ∥}_{G_{2}}^{2}) \\ + (∥ λ^{*} ∥ + 1) (\frac{{(1 - s)}^{2} β}{2 (1 + r) N} {∥ A x^{1} + B y^{1} - b ∥}^{2} + \frac{σ^{2}}{2 \sqrt{N}}), \end{aligned}

(20)

where $e$ is a unit vector satisfying $- e^{T} (A {\bar{x}}_{N} + B {\bar{y}}_{N} - b) = ∥ A {\bar{x}}_{N} + B {\bar{y}}_{N} - b ∥$ and the expectation is taken conditional on $w^{1}$ .

Proof Let

w = (x^{*}, y^{*}, λ)

in (18), where

λ = λ^{*} + e

, then the left hand side of (18) is

θ ({\bar{u}}_{N}) - θ (u^{*}) - {(λ^{*})}^{T} (A {\bar{x}}_{N} + B {\bar{y}}_{N} - b) + ∥ A {\bar{x}}_{N} + B {\bar{y}}_{N} - b ∥

, which is followed from

\begin{aligned} {({\bar{w}}_{N} - w)}^{T} F (w) \\ = & {({\bar{x}}_{N} - x^{*})}^{T} (- A^{T} λ) + {({\bar{y}}_{N} - y^{*})}^{T} (- B^{T} λ) + {({\bar{λ}}_{N} - λ)}^{T} (A x^{*} + B y^{*} - b) \\ = & λ^{T} (A x^{*} + B y^{*} - b) - λ^{T} (A {\bar{x}}_{N} + B {\bar{y}}_{N} - b) \\ = & - {(λ^{*})}^{T} (A {\bar{x}}_{N} + B {\bar{y}}_{N} - b) + ∥ A {\bar{x}}_{N} + B {\bar{y}}_{N} - b ∥, \end{aligned}

where the first equality follows from the definition of

F

, and the second and last equalities hold due to

A x^{*} + B y^{*} - b = 0

and the choice of

λ

. On the other hand, substituting

w = {\bar{w}}_{N}

into the variational inequality associated with (1), we get

θ ({\bar{u}}_{N}) - θ (u^{*}) - {(λ^{*})}^{T} (A {\bar{x}}_{N} + B {\bar{y}}_{N} - b) \geq 0

. Hence, the left hand side of (18) is equal or greater than

∥ A {\bar{x}}_{N} + B {\bar{y}}_{N} - b ∥

when letting

w = (x^{*}, y^{*}, λ^{*} + e)

and (19) is obtained by taking expectation. Substituting

w = {\bar{w}}_{N}

into the variational inequality associated with (1), we can also get

\begin{aligned} θ ({\bar{u}}_{N}) - θ (u^{*}) + {({\bar{w}}_{N} - w^{*})}^{T} F (w^{*}) \\ = & θ ({\bar{u}}_{N}) - θ (u^{*}) - {(λ^{*})}^{T} (A {\bar{x}}_{N} + B {\bar{y}}_{N} - b) \\ \geq & θ ({\bar{u}}_{N}) - θ (u^{*}) - ∥ λ^{*} ∥ ∥ A {\bar{x}}_{N} + B {\bar{y}}_{N} - b ∥, \end{aligned}

i.e.,

\begin{aligned} θ ({\bar{u}}_{N}) - θ (u^{*}) \leq & θ ({\bar{u}}_{N}) - θ (u^{*}) + {({\bar{w}}_{N} - w^{*})}^{T} F (w^{*}) \\ + ∥ λ^{*} ∥ ∥ A {\bar{x}}_{N} + B {\bar{y}}_{N} - b ∥ . \end{aligned}

Then by taking expectation, (20) is obtained.

Remark 1 (ⅰ) In Theorem 1 or Corollary 1,

τ_{k}

's are constant, and

N

needs to be selected in advance. In fact,

τ_{k}

can also vary with the number of iterations, e.g.,

τ_{k} = \sqrt{k} + M

. In this case, if the distance between

w^{k}

and

w^{*}

is bounded, i.e.,

{∥ w^{k} - w^{*} ∥}^{2} \leq R^{2}

for any

k

, we can also obtain a worst-case convergence rate. The difference with the proof idea in Theorem 1 and Corollary 1 is bounding the term

\sum_{t = 0}^{k} (∥ x^{t} - x^{*} ∥_{G_{1, t}}^{2} - ∥ x^{t + 1} - x^{*} ∥_{G_{1, t}}^{2})

, which is now bounded as follows.

\begin{aligned} \sum_{t = 0}^{k} ({∥ x^{t} - x^{*} ∥}_{G_{1, t}}^{2} - {∥ x^{t + 1} - x^{*} ∥}_{G_{1, t}}^{2}) \\ = & M {∥ x^{0} - x^{*} ∥}^{2} + \sum_{i = 0}^{k - 1} (τ_{i + 1} - τ_{i}) {∥ x^{i + 1} - x^{*} ∥}^{2} - {∥ x^{k + 1} - x^{*} ∥}_{G_{1, k}}^{2} \\ \leq & (M + \sum_{i = 0}^{k - 1} (τ_{i + 1} - τ_{i})) R^{2} \\ = & (M + \sqrt{k}) R^{2} . \end{aligned}

(ⅱ) Corollary 1 reveals that the worst-case convergence rate of SSL-ADMM for solving general convex problems is

O (\frac{1}{\sqrt{N}})

, where

N

is the iteration number.

At the end of this section, we assume that

θ_{1}

μ

-strongly convex, i.e.,

θ_{1} (x) \geq θ_{1} (y) + ⟨ \nabla θ_{1} (y), x - y ⟩ + \frac{μ}{2} {∥ x - y ∥}^{2}, μ > 0

for all

x, y \in X

. With the strong convexity, we can obtain not only the objective function value gap and constraint violation converge to zero in expectation, but also the convergence of ergodic iterates of SSL-ADMM.

Theorem 2 Assume that $(r, s) \in D$ . Let the sequence ${w^{k}}$ be generated by the SSL-ADMM and the associated ${{\tilde{w}}^{k}}$ be defined in (2), and

{\bar{w}}_{k} = \frac{1}{k} \sum_{t = 1}^{k} {\tilde{w}}^{t} .

Choosing $τ_{k} = μ (k + 1) + M$ , where $M$ is a constant satisfying the ordering relation $M I_{n_{1}} ⪰ L I_{n_{1}} + β A^{T} A$ , then SSL-ADMM has the following properties

(i)

\begin{aligned} E [∥ A {\bar{x}}_{k} + B {\bar{y}}_{k} - b ∥] \\ \leq & \frac{1}{2 k} {∥ (y^{1}, λ^{1}) - (y^{*}, λ^{*} + e) ∥}_{H_{1; 2 \times 2}}^{2} + \frac{(1 - r) β}{2 (1 + r) k} {∥ y^{1} - y^{0} ∥}_{G_{2}}^{2} \\ + \frac{{(1 - s)}^{2} β}{2 (1 + r) k} {∥ A x^{1} + B y^{1} - b ∥}^{2} + \frac{μ}{2 k} {∥ x^{1} - x^{*} ∥}^{2} + \frac{(1 + \ln k) σ^{2}}{2 μ k}, \end{aligned}

(21)

(i i)

\begin{aligned} E [θ ({\bar{u}}_{k}) - θ (u^{*})] \\ \leq & (∥ λ^{*} ∥ + 1) (\frac{1}{2 k} {∥ (y^{1}, λ^{1}) - (y^{*}, λ^{*} + e) ∥}_{H_{1; 2 \times 2}}^{2} + \frac{(1 - r) β}{2 (1 + r) k} {∥ y^{1} - y^{0} ∥}_{G_{2}}^{2}) \\ + (∥ λ^{*} ∥ + 1) (\frac{{(1 - s)}^{2} β}{2 (1 + r) k} {∥ A x^{1} + B y^{1} - b ∥}^{2} + \frac{μ}{2 k} {∥ x^{1} - x^{*} ∥}^{2} + \frac{(1 + \ln k) σ^{2}}{2 μ k}), \end{aligned}

(22)

where $e$ is a unit vector satisfying $- e^{T} (A {\bar{x}}_{N} + B {\bar{y}}_{N} - b) = ∥ A {\bar{x}}_{N} + B {\bar{y}}_{N} - b ∥$ ,

H_{1; 2 \times 2} = (\begin{matrix} (1 - \frac{r s}{r + s}) β B^{T} B + G_{2} & - \frac{r}{r + s} B^{T} \\ - \frac{r}{r + s} B & \frac{1}{β (r + s)} I_{n} \end{matrix}),

and the expectation is taken conditional on $w^{1}$ .

Proof First, similar to the proof of Lemma 2, using the

μ

-strong convexity of

θ_{1}

, we conclude that for any

Ω

\begin{aligned} θ (u) - θ ({\tilde{u}}^{k}) + {(w - {\tilde{w}}^{k})}^{T} F ({\tilde{w}}^{k}) \geq & {(w - {\tilde{w}}^{k})}^{T} Q_{k} (w^{k} - {\tilde{w}}^{k}) - {(x - {\tilde{x}}^{k})}^{T} δ^{k} \\ - \frac{L}{2} {∥ x^{k} - {\tilde{x}}^{k} ∥}^{2} + \frac{μ}{2} {∥ x - x^{k} ∥}^{2}, \forall w \in Ω . \end{aligned}

Then using

Q_{k} = H_{k} M, {({\tilde{w}}^{t} - w)}^{T} F ({\tilde{w}}^{t}) = {({\tilde{w}}^{t} - w)}^{T} F (w)

, Lemma 4, and (17), we get

\begin{aligned} θ ({\tilde{u}}^{t}) - θ (u) + {({\tilde{w}}^{t} - w)}^{T} F (w) \\ \leq & \frac{1}{2} ({∥ (y^{t}, λ^{t}) - (y, λ) ∥}_{H_{1; 2 \times 2}}^{2} - {∥ (y^{t + 1}, λ^{t + 1}) - (y, λ) ∥}_{H_{1; 2 \times 2}}^{2}) + {(x - x^{t})}^{T} δ^{t} \\ + \frac{{(1 - s)}^{2} β}{2 (1 + r)} ({∥ A x^{t} + B y^{t} - b ∥}^{2} - {∥ A x^{t + 1} + B y^{t + 1} - b ∥}^{2}) + \frac{1}{2 μ (t + 1)} {∥ δ^{t} ∥}^{2} \\ + \frac{(1 - r) β}{2 (1 + r)} ({∥ y^{t} - y^{t - 1} ∥}_{G_{2}}^{2} - {∥ y^{t + 1} - y^{t} ∥}_{G_{2}}^{2}) \\ + \frac{1}{2} (μ t {∥ x^{t} - x ∥}^{2} - μ (t + 1) {∥ x^{t + 1} - x ∥}^{2}) . \end{aligned}

Adding the above inequalities from

t = 1

k

and devided by

k

, and then following the proof of Corollary 1, we prove the assertion of this theorem.

This theorem implies that under the assumption that

θ_{1}

is strongly convex, the worst-case convergence rate for the SSL-ADMM can be improved to

O ((\ln k) / k)

with the choice of diminishing size. The following theorem shows the convergence of ergodic iterates of SSL-ADMM, which is not covered in some earlier literatures [21, 24]. Futhermore, if

θ_{2}

is also strongly convex, the assumption that

B

is full column rank can be removed.

Theorem 3 Assume that $(r, s) \in D$ . Let the sequence ${w^{k}}$ be generated by the SSL-ADMM, the associated ${{\tilde{w}}^{k}}$ be defined in (2), and

{\bar{w}}_{k} = \frac{1}{k} \sum_{t = 1}^{k} {\tilde{w}}^{t} .

Choosing $τ_{k} = μ (k + 1) + M$ , where $M$ is a constant satisfying the ordering relation $M I_{n_{1}} ⪰ L I_{n_{1}} + β A^{T} A$ , and assuming $B$ is full column rank and $λ_{min}$ denotes the minimum eigenvalue of $B^{T} B$ , then we have

\begin{aligned} E [∥ {\bar{x}}_{k} - x^{*} ∥ + ∥ {\bar{y}}_{k} - y^{*} ∥] \\ \leq & (1 + \frac{∥ A ∥}{\sqrt{λ_{min}}}) \sqrt{[\frac{2}{μ} (E [θ ({\bar{u}}_{k}) - θ (u^{*})] + ∥ λ^{*} ∥ E [∥ A {\bar{x}}_{k} + B {\bar{y}}_{k} - b ∥])]} \\ + \frac{1}{\sqrt{λ_{min}}} E ∥ A {\bar{x}}_{k} + B {\bar{y}}_{k} - b ∥, \end{aligned}

(23)

where the bounds for $E [∥ A {\bar{x}}_{k} + B {\bar{y}}_{k} - b ∥]$ and $E [θ ({\bar{u}}_{k}) - θ (u^{*})]$ are the same as in (21) and (22) respectively, and the expectation is taken conditional on $w^{1}$ .

Proof Since

(x^{*}, y^{*}, λ^{*})

is a solution of (1), we have

A^{T} λ^{*} = \nabla θ_{1} (x^{*}) a n d B^{T} λ^{*} \in \partial θ_{2} (y^{*}) .

Hence, since

θ_{1}

is strongly convex and

θ_{2}

is convex, we have

θ_{1} ({\bar{x}}_{k}) \geq θ_{1} (x^{*}) + {(λ^{*})}^{T} (A {\bar{x}}_{k} - A x^{*}) + \frac{μ}{2} {∥ {\bar{x}}_{k} - x^{*} ∥}^{2}

(24)

and

θ_{2} ({\bar{y}}_{k}) \geq θ_{2} (y^{*}) + {(λ^{*})}^{T} (B {\bar{y}}_{k} - B y^{*}) .

(25)

Adding up (24) and (25), we get

θ ({\bar{u}}_{k}) \geq θ (u^{*}) + {(λ^{*})}^{T} (A {\bar{x}}_{k} + B {\bar{y}}_{k} - b) + \frac{μ}{2} {∥ {\bar{x}}_{k} - x^{*} ∥}^{2},

that is

\begin{aligned} ∥ {\bar{x}}_{k} - x^{*} ∥ \leq & \sqrt{\frac{2}{μ} (θ ({\bar{u}}_{k}) - θ (u^{*}) - {(λ^{*})}^{T} (A {\bar{x}}_{k} + B {\bar{y}}_{k} - b))} \\ \leq & \sqrt{\frac{2}{μ} (θ ({\bar{u}}_{k}) - θ (u^{*}) + ∥ λ^{*} ∥ ∥ A {\bar{x}}_{k} + B {\bar{y}}_{k} - b ∥)} . \end{aligned}

(26)

On the other hand,

\begin{aligned} ∥ A {\bar{x}}_{k} + B {\bar{y}}_{k} - b ∥ = & ∥ A ({\bar{x}}_{k} - x^{*}) + B ({\bar{y}}_{k} - y^{*}) ∥ \\ \geq & ∥ B ({\bar{y}}_{k} - y^{*}) ∥ - ∥ A ∥ ∥ {\bar{x}}_{k} - x^{*} ∥, \end{aligned}

this implies

∥ B ({\bar{y}}_{k} - y^{*}) ∥ \leq ∥ A ∥ ∥ {\bar{x}}_{k} - x^{*} ∥ + ∥ A {\bar{x}}_{k} + B {\bar{y}}_{k} - b ∥

and hence

∥ {\bar{y}}_{k} - y^{*} ∥ \leq \frac{∥ A ∥}{\sqrt{λ_{min}}} ∥ {\bar{x}}_{k} - x^{*} ∥ + \frac{1}{\sqrt{λ_{min}}} ∥ A {\bar{x}}_{k} + B {\bar{y}}_{k} - b ∥ .

(27)

Adding (26) and (27), using Jensen's inequality

E [X^{\frac{1}{2}}] \leq {(E X)}^{\frac{1}{2}}

for a random variable

X

, and taking expecation imply

\begin{aligned} E [∥ {\bar{x}}_{k} - x^{*} ∥ + ∥ {\bar{y}}_{k} - y^{*} ∥] \\ \leq & (1 + \frac{∥ A ∥}{\sqrt{λ_{min}}}) \sqrt{E [\frac{2}{μ} (θ ({\bar{u}}_{k}) - θ (u^{*}) + ∥ λ^{*} ∥ ∥ A {\bar{x}}_{k} + B {\bar{y}}_{k} - b ∥)]} \\ + \frac{1}{\sqrt{λ_{min}}} E ∥ A {\bar{x}}_{k} + B {\bar{y}}_{k} - b ∥ . \end{aligned}

The proof is completed.

4 High Probability Performance Analysis

In this section, we shall establish the large deviation properties of SSL-ADMM. By (19) and (20), and Markov's inequality, we have for any

ε_{1} > 0

and

ε_{2} > 0

that

\begin{array}{l} P r {∥ A {\bar{x}}_{N} + B {\bar{y}}_{N} - b ∥ \leq ε_{1} (\frac{1}{2 N} {∥ w^{1} - (x^{*}, y^{*}, λ^{*} + e) ∥}_{H_{1}}^{2} + \frac{(1 - r) β}{2 (1 + r) N} {∥ y^{1} - y^{0} ∥}_{G_{2}}^{2} \\ + \frac{{(1 - s)}^{2} β}{2 (1 + r) N} {∥ A x^{1} + B y^{1} - b ∥}^{2} + \frac{σ^{2}}{2 \sqrt{N}})} \geq 1 - \frac{1}{ε_{1}} \end{array}

(28)

and

\begin{aligned} P r {θ ({\bar{u}}_{N}) - θ (u^{*}) \leq & ε_{2} ((∥ λ^{*} ∥ + 1) (\frac{1}{2 N} {∥ w^{1} - (x^{*}, y^{*}, λ^{*} + e) ∥}_{H_{1}}^{2} + \frac{(1 - r) β}{2 (1 + r) N} {∥ y^{1} - y^{0} ∥}_{G_{2}}^{2}) \\ + (∥ λ^{*} ∥ + 1) (\frac{{(1 - s)}^{2} β}{2 (1 + r) N} {∥ A x^{1} + B y^{1} - b ∥}^{2} + \frac{σ^{2}}{2 \sqrt{N}}))} \geq 1 - \frac{1}{ε_{2}} . \end{aligned}

(29)

However, these bounds are not strong. In the following, we will show these high probability bounds can be significantly improved when imposing standard "light-tail" assumption, see, e.g., [1, 32]. Specifically, assume that for any

x \in X

E [e x p {{∥ G (x, ξ) - \nabla θ_{1} (x) ∥}^{2} / σ^{2}}] \leq e x p {1} .

This assumption is a little bit stronger than b) in Assumption (ⅲ), which can be explained by Jensen's inequality. For further analysis, we assume that

X

is bounded and its diameter is denoted by

D_{X}

, defined as

max_{x_{1}, x_{2} \in X} ∥ x_{1} - x_{2} ∥

. The following theorem shows the high probability bound for objective error and constraint violation of SSL-ADMM.

Theorem 4 Assume that all the conditions in Theorem $1$ hold, then SSL-ADMM has the following properties

(i)

\begin{aligned} P r {∥ A {\bar{x}}_{N} + B {\bar{y}}_{N} - b ∥ \leq \frac{1}{2 N} {∥ w^{1} - (x^{*}, y^{*}, λ^{*} + e) ∥}_{H_{1}}^{2} + \frac{(1 - r) β}{2 (1 + r) N} {∥ y^{1} - y^{0} ∥}_{G_{2}}^{2} \\ + \frac{{(1 - s)}^{2} β}{2 (1 + r) N} {∥ A x^{1} + B y^{1} - b ∥}^{2} + \frac{Θ D_{X} σ}{\sqrt{N}} + \frac{1}{2 \sqrt{N}} (1 + Θ) σ^{2}} \\ \geq & 1 - e x p {- Θ^{2} / 3} - e x p {- Θ}, \end{aligned}

(30)

(i i)

\begin{aligned} P r {θ ({\bar{u}}_{N}) - θ (u^{*}) \leq (∥ λ^{*} ∥ + 1) (\frac{1}{2 N} {∥ w^{1} - (x^{*}, y^{*}, λ^{*} + e) ∥}_{H_{1}}^{2} + \frac{(1 - r) β}{2 (1 + r) N} {∥ y^{1} - y^{0} ∥}_{G_{2}}^{2}) \\ + (∥ λ^{*} ∥ + 1) (\frac{{(1 - s)}^{2} β}{2 (1 + r) N} {∥ A x^{1} + B y^{1} - b ∥}^{2} + \frac{Θ D_{X} σ}{\sqrt{N}} + \frac{1}{2 \sqrt{N}} (1 + Θ) σ^{2})} \\ \geq & 1 - e x p {- Θ^{2} / 3} - e x p {- Θ}, \end{aligned}

(31)

where $e$ is a unit vector satisfying $- e^{T} (A {\bar{x}}_{N} + B {\bar{y}}_{N} - b) = ∥ A {\bar{x}}_{N} + B {\bar{y}}_{N} - b ∥$ .

Proof Let

ζ^{t} = \frac{1}{N} {(x^{*} - x^{t})}^{T} δ^{t}

. Clearly,

{ζ^{t}}_{t \geq 1}

is a martingale-difference sequence. Moreover, it follows from the definition of

D_{X}

and that light-tail assumption that

E [e x p {{(ζ^{t})}^{2} / {(\frac{1}{N} D_{X} σ)}^{2}}] \leq E [e x p {{(\frac{1}{N} D_{X} ∥ δ^{t} ∥)}^{2} / {(\frac{1}{N} D_{X} σ)}^{2}}] \leq e x p {1} .

Now using Proposition 2 for the martingale-difference sequence, we have for any

Θ \geq 0

P r {\sum_{t = 1}^{N} ζ^{t} > \frac{Θ D_{X} σ}{\sqrt{N}}} \leq e x p {- Θ^{2} / 3} .

(32)

Also, observe that by Jensen's inequality for the exponential function

e x p {\frac{1}{N} \sum_{t = 1}^{N} ({∥ δ^{t} ∥}^{2} / σ^{2})} \leq \frac{1}{N} \sum_{t = 1}^{N} e x p {{∥ δ^{t} ∥}^{2} / σ^{2}},

whence, taking expectation,

E [e x p {\frac{1}{N} \sum_{t = 1}^{N} {∥ δ^{t} ∥}^{2} / σ^{2}}] \leq \frac{1}{N} \sum_{t = 1}^{N} E [e x p {{∥ δ^{t} ∥}^{2} / σ^{2}}] \leq e x p {1} .

It then follows from Markov's inequality that for any

Θ \geq 0

P r {\frac{1}{N} \sum_{t = 1}^{N} {∥ δ^{t} ∥}^{2} \geq (1 + Θ) σ^{2}} \leq e x p {- Θ} .

(33)

Using (32) and (33) in (18) for

w = (x^{*}, y^{*}, λ^{*} + e)

, we conclude that

\begin{aligned} P r {∥ A {\bar{x}}_{N} + B {\bar{y}}_{N} - b ∥ > \frac{1}{2 N} {∥ w^{1} - (x^{*}, y^{*}, λ^{*} + e) ∥}_{H_{1}}^{2} + \frac{(1 - r) β}{2 (1 + r) N} {∥ y^{1} - y^{0} ∥}_{G_{2}}^{2} \\ + \frac{{(1 - s)}^{2} β}{2 (1 + r) N} {∥ A x^{1} + B y^{1} - b ∥}^{2} + \frac{Θ D_{X} σ}{\sqrt{N}} + \frac{1}{2 \sqrt{N}} (1 + Θ) σ^{2}} \leq e x p {- Θ^{2} / 3} + e x p {- Θ} \end{aligned}

(34)

and

\begin{aligned} P r {θ ({\bar{u}}_{N}) - θ (u^{*}) > (∥ λ^{*} ∥ + 1) (\frac{1}{2 N} {∥ w^{1} - (x^{*}, y^{*}, λ^{*} + e) ∥}_{H_{1}}^{2} + \frac{(1 - r) β}{2 (1 + r) N} {∥ y^{1} - y^{0} ∥}_{G_{2}}^{2}) \\ + (∥ λ^{*} ∥ + 1) (\frac{{(1 - s)}^{2} β}{2 (1 + r) N} {∥ A x^{1} + B y^{1} - b ∥}^{2} + \frac{Θ D_{X} σ}{\sqrt{N}} + \frac{1}{2 \sqrt{N}} (1 + Θ) σ^{2})} \\ \leq & e x p {- Θ^{2} / 3} + e x p {- Θ} . \end{aligned}

(35)

The result immediately follows from the above inequalities.

Remark 2 In view of Theorem 4, if we take

Θ = l n N

, then we have

P r {∥ A {\bar{x}}_{N} + B {\bar{y}}_{N} - b ∥ \leq O (\frac{\ln N}{\sqrt{N}})} \geq 1 - \frac{1}{N^{2 / 3}} - \frac{1}{N}

and

P r {θ ({\bar{u}}_{N}) - θ (u^{*}) \leq O (\frac{\ln N}{\sqrt{N}})} \geq 1 - \frac{1}{N^{2 / 3}} - \frac{1}{N} .

For strongly convex case, using similar derivation, the high probability bound for objective error and constraint violation of SSL-ADMM is

P r {∥ A {\bar{x}}_{N} + B {\bar{y}}_{N} - b ∥ \leq O (\frac{{(\ln N)}^{2}}{N})} \geq 1 - \frac{1}{N^{2 / 3}} - \frac{1}{N}

and

P r {θ ({\bar{u}}_{N}) - θ (u^{*}) \leq O (\frac{{(\ln N)}^{2}}{N})} \geq 1 - \frac{1}{N^{2 / 3}} - \frac{1}{N} .

Observe that the convergence rate of ergodic iterates of SSL-ADMM is obtained in Theorem 3. The high probability bound can be also established, which is shown as follows

P r {∥ {\bar{x}}_{N} - x^{*} ∥ + ∥ {\bar{y}}_{N} - y^{*} ∥ \leq O (\frac{\ln N}{\sqrt{N}})} \geq 1 - \frac{1}{N^{2 / 3}} - \frac{1}{N},

where

N

is the iteration number. In contrast to (28) and (29), we can observe that the results in Theorem 4 are much finer.

5 Preliminary Numerical Experiments

In this section, we report some numerical results on the following graph-guided fused lasso problem in statistical machine learning:

min_{x} E_{ξ} f_{ξ} (x) + μ ∥ A x ∥_{1},

where

f_{ξ} (x) = \log (1 + \exp (- t \cdot l^{T} x))

is the logistic loss function on the feature-label pair

ξ = (l, t) \in R^{d} \times {- 1, 1}

μ

is a given regularization parameter, and

A = [G; I]

, where

G

is obtained by sparse inverse covariance estimation [33]. By introducing another block variable

y

, and imposing the constraint

A x = y

, this problem is reformulated into a form of (1) with

θ_{1} (x) = E_{ξ} f_{ξ} (x), θ_{2} (y) = μ {∥ y ∥}_{1}, B = - I

, and

b = 0

The dataset used in numerical experiments is taken from the LIBSVM website¹, which is summarized in the Table 1. In our experiments, the regularization parameter

μ

and penalty parameter

β

are set to be

1 \times 10^{- 5}

and

1 \times 10^{- 3}

, respectively; the initial points are set to be uniformly random vectors in the interval

{[- 1, 1]}^{d}

; other parameters are chosen according to the conditions in corollaries. We plot Opt_err, the maximum of the objective function value error and constraint violation, versus CPU time in seconds, where the approximate optimal objective function value is obtained by running some convergent ADMM-type algorithm for more than 10 minutes. Figure 1 shows the performances of SSL-ADMM and generalized version of [21] (the algorithm in [21] is indeed a stochastic linearized ADMM, hence we call the generalized version of it SLG-ADMM and it is more efficient than the original version), which updates dual variable only once at each iteration, and the results indicate that an improvement from symmetrically updating dual variable also occurs in the stochastic setting. Since our paper is to demonstrate that symmetrically update multipliers are still valid for stochastic optimization problems, we only compare SSL-ADMM with the algorithm in [21] (this paper made comparisons with other algorithms for stochastic optimization), not with the algorithms designed for deterministic finite sum optimization problems, for example, in [29, 30].

¹https://www.csie.ntu.edu.tw~cjlin/libsvmtools/datasets/.

Table 1 Real-world datasets and regularization parameters

Dataset	Number of samples	Dimensionality
$a8a$	22696	123
$a9a$	32561	123
$ijcnn1$	49990	22
$w8a$	49749	300

Figure 1 Opt_err vs (vertical axis) vs the CPU time (s) (horizontal axis) for a8a, a9a, ijcnn1, and w8a respectively

Full size|PPT slide

6 Summary

In this paper, we analyze the expected convergence rates and the large deviation properties of a stochastic variant of symmetric ADMM using the variational inequality framework. By means of this framework, the proof is very clear. Numerical experiments on some real-world datasets demonstrate that symmetrically updating the dual variable can lead to an algorithmic improvement in the stochastic setting. When the model is deterministic and

S F O

is not needed, our proposed algorithm reduces to a symmetric proximal ADMM, and the convergence region of

(r, s)

is the same as that in the corresponding literature.

References

Publishing order | Descend order by publishing year | Descend order by cited within

1	Nemirovski A, Juditsky A, Lan G, et al. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 2009, 19 (4): 1574- 1609. https://doi.org/10.1137/070704277 Cited in this article [2]

2	Lan G. An optimal method for stochastic composite optimization. Mathematical Programming, 2012, 133 (1): 365- 397.

3	Ghadimi S, Lan G, Zhang H. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming, 2016, 155 (1): 267- 305. Cited in this article [1]

4	Robbins H, Monro S. A stochastic approximation method. The Annals of Mathematical Statistics, 1951, 22 (3): 400- 407. https://doi.org/10.1214/aoms/1177729586 Cited in this article [1]

5	Boyd S, Parikh N, Chu E, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends R in Machine Learning, 2011, 3 (1): 1- 122. Cited in this article [1]

6	Glowinski R, Marroco A. Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires. Journal of Equine Veterinary Science, 1975, 9 (2): 41- 76. Cited in this article [1]

7	Gabay D, Mercier B. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers & Mathematics with Applications, 1976, 2 (1): 17- 40. Cited in this article [1]

8	Glowinski R. On alternating direction methods of multipliers: A historical perspective. Modeling, Simulation and Optimization for Science and Technology, Springer, Dordrecht, 2014: 59-82. Cited in this article [1]

9	Eckstein J, Bertsekas D P. On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 1992, 55 (1): 293- 318. Cited in this article [2]

10	He B, Yuan X. On the O(1/n) convergence rate of the Douglas-Rachford alternating direction method. SIAM Journal on Numerical Analysis, 2012, 50 (2): 700- 709. https://doi.org/10.1137/110836936

11	Monteiro R D C, Svaiter B F. Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM Journal on Optimization, 2013, 23 (1): 475- 507. https://doi.org/10.1137/110849468

12	He B, Yuan X. On non-ergodic convergence rate of Douglas-Rachford alternating direction method of multipliers. Numerische Mathematik, 2015, 130 (3): 567- 577. https://doi.org/10.1007/s00211-014-0673-6

13	Deng W, Yin W. On the global and linear convergence of the generalized alternating direction method of multipliers. Journal of Scientific Computing, 2016, 66 (3): 889- 916. https://doi.org/10.1007/s10915-015-0048-x

14	Yang W H, Han D. Linear convergence of the alternating direction method of multipliers for a class of convex optimization problems. SIAM Journal on Numerical Analysis, 2016, 54 (2): 625- 640. https://doi.org/10.1137/140974237

15	Han D, Sun D, Zhang L. Linear rate convergence of the alternating direction method of multipliers for convex composite programming. Mathematics of Operations Research, 2018, 43 (2): 622- 637. https://doi.org/10.1287/moor.2017.0875 Cited in this article [1]

16	Li G, Pong T K. Global convergence of splitting methods for nonconvex composite optimization. SIAM Journal on Optimization, 2015, 25 (4): 2434- 2460. https://doi.org/10.1137/140998135 Cited in this article [1]

17	Wang Y, Yin W, Zeng J. Global convergence of ADMM in nonconvex nonsmooth optimization. Journal of Scientific Computing, 2019, 78 (1): 29- 63. https://doi.org/10.1007/s10915-018-0757-z

18	Jiang B, Lin T, Ma S, et al. Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Computational Optimization and Applications, 2019, 72 (1): 115- 157. https://doi.org/10.1007/s10589-018-0034-y

19	Zhang J, Luo Z Q. A proximal alternating direction method of multiplier for linearly constrained nonconvex minimization. SIAM Journal on Optimization, 2020, 30 (3): 2272- 2302. https://doi.org/10.1137/19M1242276 Cited in this article [1]

20	Han D R. A survey on some recent developments of alternating direction method of multipliers. Journal of the Operations Research Society of China, 2022, 10 (1): 1- 52. https://doi.org/10.1007/s40305-021-00368-3 Cited in this article [1]

21	Ouyang H, He N, Tran L, et al. Stochastic alternating direction method of multipliers. International Conference on Machine Learning, 2013, 80- 88. Cited in this article [6]

22	Suzuki T. Dual averaging and proximal gradient descent for online alternating direction multiplier method. International Conference on Machine Learning, 2013, 392- 400.

23	Zhao P, Yang J, Zhang T, et al. Adaptive stochastic alternating direction method of multipliers. International Conference on Machine Learning, 2015, 69- 77.

24	Gao X, Jiang B, Zhang S. On the information-adaptive variants of the ADMM: An iteration complexity perspective. Journal of Scientific Computing, 2018, 76 (1): 327- 363. https://doi.org/10.1007/s10915-017-0621-6 Cited in this article [3]

25	He B, Liu H, Wang Z, et al. A strictly contractive Peaceman-Rachford splitting method for convex programming. SIAM Journal on Optimization, 2014, 24 (3): 1011- 1040. https://doi.org/10.1137/13090849X Cited in this article [1]

26	He B, Ma F, Yuan X. Convergence study on the symmetric version of ADMM with larger step sizes. SIAM Journal on Imaging Sciences, 2016, 9 (3): 1467- 1501. https://doi.org/10.1137/15M1044448

27	Bai J, Li J, Xu F, et al. Generalized symmetric ADMM for separable convex optimization. Computational Optimization and Applications, 2018, 70 (1): 129- 170. https://doi.org/10.1007/s10589-017-9971-0 Cited in this article [1]

28	Lions P L, Mercier B. Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis, 1979, 16 (6): 964- 979. https://doi.org/10.1137/0716071 Cited in this article [1]

29	Bai J, Hager W W, Zhang H. An inexact accelerated stochastic ADMM for separable convex optimization. Computational Optimization and Applications, 2022, 81 (2): 479- 518. https://doi.org/10.1007/s10589-021-00338-8 Cited in this article [2]

30	Bai J, Han D, Sun H, et al. Convergence analysis of an inexact accelerated stochastic ADMM with larger stepsizes. CSIAM Transactions on Applied Mathematics, 2022, 3 (3): 448- 479. https://doi.org/10.4208/csiam-am.SO-2021-0021 Cited in this article [2]

31	He B S. On the convergence properties of alternating direction method of multipliers. Numerical Mathematics, 2017, 39, 81- 96. Cited in this article [1]

32	Lan G. First-order and stochastic optimization methods for machine learning, Springer, New York, 2020. Cited in this article [2]

33	Friedman J, Hastie T, Tibshirani R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 2008, 9 (3): 432- 441. https://doi.org/10.1093/biostatistics/kxm045 Cited in this article [1]

Funding

National Natural Science Foundation of China(61662036)

PDF(286 KB)

642

Accesses

Citation

Detail

Sections

Recommended

Abstract
Key words
Cite this article
1 Introduction
2 Preliminaries
3 Symmetric Stochastic Linearized ADMM
4 High Probability Performance Analysis
5 Preliminary Numerical Experiments
Table 1 Real-world datasets and regularization parameters
Figure 1 Opt_err vs (vertical axis) vs the CPU time (s) (horizontal axis) for a8a, a9a, ijcnn1, and w8a respectively
6 Summary
References
Funding

Received	Accepted	Published
2022-10-22	2023-01-11	2023-02-25
Issue Date
2023-03-02

Please choose a citation manager

Content to export

Abstract

Key words

Cite this article

1 Introduction

2 Preliminaries

3 Symmetric Stochastic Linearized ADMM

4 High Probability Performance Analysis

5 Preliminary Numerical Experiments

Table 1 Real-world datasets and regularization parameters

Figure 1 Opt_err vs (vertical axis) vs the CPU time (s) (horizontal axis) for a8a, a9a, ijcnn1, and w8a respectively

6 Summary

{{custom_sec.title}}

{{custom_sec.title}}

References

{{custom_fnGroup.title_en}}

Footnotes

Funding

Share

模态框（Modal）标题

Please choose a citation manager

Content to export

Abstract

Key words

Cite this article

1 Introduction

2 Preliminaries

3 Symmetric Stochastic Linearized ADMM

4 High Probability Performance Analysis

5 Preliminary Numerical Experiments

Table 1 Real-world datasets and regularization parameters

Figure 1 Opt_err vs (vertical axis) vs the CPU time (s) (horizontal axis) for a8a, a9a, ijcnn1, and w8a respectively

6 Summary

{{custom_sec.title}}

{{custom_sec.title}}

References

{{custom_fnGroup.title_en}}

Footnotes

Funding