
Semi-supervised learning (SSL) has been applied to many practical applications over the past few years. Recently, distributed graph-based semi-supervised learning (DGSSL) has been shown to have good performance. Current DGSSL algorithms usually have the problems of inefficient graph construction and the straggler effect. This paper proposes a novel coded DGSSL (CDGSSL) to solve these problems. We first provide a novel parallel and distributed solution of matrix completion for efficient graph construction. Then, we develop the CDGSSL algorithm based on coding theory. Specifically, the proposed algorithm consists of two parts separately designed based on the maximum distance separable (MDS) code. In general, the proposed coded distributed algorithm is efficient and straggler tolerant. Moreover, we provide an optimal parameter design for the proposed algorithm. The results of the experiments on the Alibaba Cloud elastic compute service (ECS) demonstrate the superiority of the proposed algorithm.
The proposed algorithm consists of two parts: coded matrix completion and coded label propagation.
Semi-supervised learning (SSL) has been applied to many practical applications over the past few years. Recently, distributed graph-based semi-supervised learning (DGSSL) has been shown to have good performance. Current DGSSL algorithms usually have the problems of inefficient graph construction and the straggler effect. This paper proposes a novel coded DGSSL (CDGSSL) to solve these problems. We first provide a novel parallel and distributed solution of matrix completion for efficient graph construction. Then, we develop the CDGSSL algorithm based on coding theory. Specifically, the proposed algorithm consists of two parts separately designed based on the maximum distance separable (MDS) code. In general, the proposed coded distributed algorithm is efficient and straggler tolerant. Moreover, we provide an optimal parameter design for the proposed algorithm. The results of the experiments on the Alibaba Cloud elastic compute service (ECS) demonstrate the superiority of the proposed algorithm.
Semi-supervised learning (SSL) has been of significant interest in machine learning and artificial intelligence over the past few years[1]. Compared with supervised learning (SL) and unsupervised learning (UL), SSL requires less human effort but gives higher accuracy[2]. SSL utilizes the information of both labeled and unlabeled data to obtain better performance than SL and UL. Thus, how discovering the input data distribution by exploiting unlabeled data is crucial in SSL[3].
The initial SSL algorithm was self-training in Ref. [4], which used the data set with all attributes as input data. Cotraining was another SSL algorithm in Ref. [5], which assumed that the input data set consists of two subsets with different views. In general, three assumptions on the input data space are commonly used in SSL, smoothness, cluster, and manifold. Using these assumptions, various effective algorithms have been proposed[6–8]. Among these methods, graph-based SSL (GSSL) has attracted growing attention because it usually involves optimizing a convex objective and is both easily scalable and parallelizable. In GSSL, the original data structure was represented by a graph whose edges were measured by the similarity between data samples[9]. Then, the labeled information can be propagated to the unlabeled samples through the graph, which is noted as label propagation. Most GSSL methods address label propagation by solving a convex quadratic optimization problem for feature vectors. Some notable GSSL methods proposed in the literature include Gaussian field and harmonic function (GFHF)[10], local and global consistency[11], regularized Laplacian[12], and random walks[13].
The previously mentioned schemes are then applied to perform machine learning tasks, such as clustering, classification, and regression. Semi-supervised clustering aims to obtain better clusters than unsupervised clustering[14]. Semi-supervised classification (SSC) and semi-supervised regression (SSR) have been shown to outperform supervised classification and regression[15,16]. Although these algorithms can be applied to solve many problems in practice, most of them have focused on centralized learning. However, distributed protocols for SSL are urgently needed for multiple nodes, such as medical diagnostics[17], distributed music classification[18], and distributed multimedia classification[19].
To effectively utilize data separately stored over communication networks, several distributed SSL (DSSL) approaches have been designed. Shen et al.[20] proposed two distributed semi-supervised metric learning frameworks using diffusion cooperation and alternating direction multipliers (ADMM) strategies. Scardapane et al.[21] proposed a DSSL algorithm based on the in-network successive convex approximation (NEXT) framework. Due to the scalability and parallelizability of GSSL, some works have proposed distributed GSSL (DGSSL). DGSSL focuses on constructing an efficient graph and preserving data privacy in multiple agent cases. Fierimonte et al.[22] proposed the D-LapRLS algorithm with the kernel method and distributed average consensus (DAC) strategy. In addition, they proposed a distributed matrix completion algorithm based on the framework of diffusion adaptation to calculate the global Euclidean distance matrix (EDM). Güler et al.[23] made use of the privacy shield protocol multiparty computation (MPC) and homomorphic encryption (HE) to propose a DGSSL algorithm.
However, most existing DGSSL designs usually suffer from the following two problems. First, constructing a graph with dispersedly stored data is a major problem for DGSSL. To solve this problem, existing DGSSL designs have provided several graph construction algorithms, mainly focusing on improving classification accuracy and protecting data privacy. These algorithms require extensive communication and computation, which calls for more efficient schemes with lower complexity. Second, most DGSSLs solve the optimization problem for feature vectors using iteration solutions. The iteration time is determined by the slowest node, noted as the straggler node. Thus, the algorithm execution time of the current state-of-art DGSSL schemes is limited by the straggler nodes. The corruption of any node may lead to the inability of iterations to occur, resulting in prolonged training time.
Motivated by the above observations, this paper proposes coded DGSSL (CDGSSL) to efficiently construct a graph and alleviate the effect of straggler nodes. In consideration of the classification performance, we adopt EDM completion for graph construction. Although EDM completion has been applied in the D-LapRLS algorithm, its necessary step requires computing the global EDM updates at each node and computing the new update with the computed mean. Such a solution has a high computational cost, growing as a polynomial with the global graph size. We provide a novel parallel and distributed approach for global EDM completion to reduce its computational cost. To address the straggler problem, we further develop a coded computation design for DGSSL by adopting the maximum distance separable (MDS) code. Note that although coded computation has been applied to matrix multiplication for solving the straggler problem[24–26], how to employ it in DGSSL is still unknown.
In the present work, we assume that a dataset is partitioned and arbitrarily distributed over different nodes. Every node can communicate with a central server. Our purpose is to use DGSSL for a binary classification problem without exchanging any data points. From the generalized formulation in Ref. [10], information is mostly encoded in a matrix of pairwise distances between data samples. In the initial phase of the algorithm, each node computes the local distance matrix with its data before sending it to the server. To complete the global distance matrix, we address the EDM completion problem using a gradient descent method similar to previous works[27–30]. Moreover, we also address the label propagation using the gradient descent method. Then, we provide a parallel and distributed solution to reduce the cost of computation per iteration. Each iteration requires the node to compute a partial gradient before sending the results back to the server. In addition, we further develop a coded DGSSL algorithm consisting of two-component algorithms, namely coded matrix completion and coded label propagation. Since the computation is mainly determined by matrix-vector or matrix-matrix multiplication, our algorithm works by encoding the computational matrix with MDS code. In this light, the server only requires partial results to decode. Furthermore, we offer the optimal parameter design of the proposed algorithm. The proposed scheme can be applied to travel mode identification in intelligent transportation systems (ITS), which can enable institutes to understand human behaviors and urban management better and plan[31]. The main contributions of this paper are summarized as follows.
(ⅰ) A novel distributed EDM completion algorithm for efficient graph construction. We adopt EDM completion to improve the classification performance of DGSSL. The solution in Ref. [22] has high computation costs because each node has to compute the global EDM update. We overcome this problem by providing a parallel and distributed solution for global EDM completion. Instead of computing the global EDM update, each node only needs to compute a partial gradient. Therefore, it reduces computational cost but with similar classification performance.
(ⅱ) A coded computation design for DGSSL. According to the characteristic that coded computation declines the straggler effect, we provide a novel coded computation design for DGSSL in this work. To exploit the coded computation solutions, we adopt gradient descent methods so that the computation is concentrated on matrix-vector or matrix-matrix multiplication. The server only requires partial results to obtain the gradient by encoding the computational matrix with MDS code.
(ⅲ) Optimal parameter design for the proposed algorithm. Coding parameter and matrix completion parameter are important for the algorithm execution time, classification performance, and straggler tolerance. We optimize coding parameters based on the expected overall runtime. We then provide the optimal matrix completion parameter design by introducing a novel measure to quantify how much accuracy is obtained for the matrix completion algorithm.
The rest of this paper is organized as follows. First, Section 2 introduces the system model for the DSSL and the theoretical tools upon which the proposed algorithm is based. In Section 3, we detail our proposed framework for CDGSSL. In particular, we introduce the matrix completion problem in Section 3.1, propose a DGSSL under a distributed computing framework in Section 3.2, and develop the CDGSSL algorithm in Section 3.3. In addition, we introduce the optimal parameter design in the algorithm in Section 4. After that, in Section 5, we give the numerical results of the proposed algorithm to illustrate its effectiveness. Finally, conclusions are drawn in Section 6.
In this paper, we use boldface lowercase letters to denote vectors, e.g.,
This paper considers a DGSSL scenario for a binary classification problem with
Data | { { {\boldsymbol{x} } }_{i} } |
Data label | { { {\boldsymbol{y} } }_{i} } |
Local data set | { { {\boldsymbol{X} } }_{i} }=\{ { { {\boldsymbol{x} } }_{ij} },j=1,\cdots,{ {M}_{i} }\} |
Local data label | { { {\boldsymbol{Y} } }_{i} }=\{ { { {\boldsymbol{y} } }_{ij} },j=1,\cdots,{ {M}_{i} }\} |
Global data set | {\boldsymbol{X} }={ { {\boldsymbol{X} } }_{1} }\cup \cdots\cup { { {\boldsymbol{X} } }_{N} } |
Global data label | {\boldsymbol{Y} }={ { {\boldsymbol{Y} } }_{1} }\cup \cdots\cup { { {\boldsymbol{Y} } }_{N} } |
Subscript {l } | Labeled |
Subscript {u } | Unlabeled |
Unlabeled data set | { { {\boldsymbol{X} } }_{ {u } } } |
Labeled data set | { { {\bf{X} } }_{ {l } } } |
Local unlabeled data set | { { {\boldsymbol{U} } }_{i} }={ { {\boldsymbol{X} } }_{ {u } } }\cap { { {\boldsymbol{X} } }_{i} } |
Local labeled data set | { { {\boldsymbol{S} } }_{i} }={ { {\boldsymbol{X} } }_{ {l } } }\cap { { {\boldsymbol{X} } }_{i} } |
Weight matrix | {\boldsymbol{W} } |
Classifier | f:{\boldsymbol{V} } \to \mathbb{R} |
Euclidean distance matrix | {\boldsymbol{D} } |
Diagonal matrix | {\boldsymbol{H} } |
Combinatorial Laplacian | \Delta |
The proposed algorithm is based on GSSL algorithms, which are capable of processing large-scale data sets in practice and are accessible to parallelize[9]. GSSL setups can be divided into the following two steps.
Label propagation: Propagate label information to unlabeled data through graphs.
The centralized SSL works as follows. In GSSL algorithm, graph
\begin{equation} {{({\boldsymbol{W}})}_{ij}}={{w}_{ij}}=\left\{ \begin{matrix} \exp \left( \dfrac{-\left\| {{{\boldsymbol{x}}}_{i}}-{{{\boldsymbol{x}}}_{j}} \right\|_{2}^{2}}{2{{\sigma }^{2}}} \right),i\ne j; \\ 0,\text{ otherwise }. \\ \end{matrix}\right. \end{equation} | (1) |
For the binary classification problem, a real-valued function
\begin{equation} E(f)=\frac{1}{2}\sum\limits_{i=1}^{m}{\sum\limits_{j=1}^{m}{{{w}_{ij}}}}{{\left( f\left( {{{\boldsymbol{x}}}_{i}} \right)-f\left( {{{\boldsymbol{x}}}_{j}} \right) \right)}^{2}}={{\mathit{\boldsymbol{f}}}^{{\rm{T}}}}\Delta \mathit{\boldsymbol{f}}, \end{equation} | (2) |
where f
\begin{equation} f({{{\boldsymbol{x}}}_{j}})=\frac{1}{{{h}_{j}}}\sum\limits_{i}^{m}{{{w}_{ij}}f}({{{\boldsymbol{x}}}_{i}}). \end{equation} | (3) |
This is consistent with the smoothness of the graph. Thus, label information can be propagated to unlabeled data through the graph by solving the problem
\begin{equation} \min\limits_{\mathit{\boldsymbol{f}}_{u}} E(f). \end{equation} | (4) |
To explicitly calculate the results with matrix operations, the weight matrix
\begin{align} \min E(f) =& {{\mathit{\boldsymbol{f}}}^{{\rm{T}}}}({\boldsymbol{H}}-{\boldsymbol{W}})\mathit{\boldsymbol{f}} =\\ &\mathit{\boldsymbol{f}}_{l}^{{\rm{T}}}\left( {{{\boldsymbol{H}}}_{ll}}-{{{\boldsymbol{W}}}_{ll}} \right){{\mathit{\boldsymbol{f}}}_{l}}+\mathit{\boldsymbol{f}}_{u}^{{\rm{T}}}\left( {{{\boldsymbol{H}}}_{uu}}-{{{\boldsymbol{W}}}_{uu}} \right){{\mathit{\boldsymbol{f}}}_{u}}- \\ & 2\mathit{\boldsymbol{f}}_{u}^{{\rm{T}}}{{{\boldsymbol{W}}}_{ul}}{{\mathit{\boldsymbol{f}}}_{l}}, \end{align} | (5) |
where we have
\begin{equation} {\boldsymbol{W}}=\left[ \begin{matrix} {{{\boldsymbol{W}}}_{ll}} & {{{\boldsymbol{W}}}_{lu}} \\ {{{\boldsymbol{W}}}_{ul}} & {{{\boldsymbol{W}}}_{uu}} \\ \end{matrix} \right],{\boldsymbol{H}}=\left[ \begin{matrix} {{{\boldsymbol{H}}}_{ll}} & {{{\bf{0}}}_{lu}} \\ {{{\bf{0}}}_{ul}} & {{{\boldsymbol{H}}}_{uu}} \\ \end{matrix} \right]. \end{equation} | (6) |
The explicit expression for the optimal solution to Eq. (5) is
\begin{equation} {{\mathit{\boldsymbol{f}}}_{u}}={{({{{\boldsymbol{H}}}_{uu}}-{{{\boldsymbol{W}}}_{uu}})}^{-1}}{{{\boldsymbol{W}}}_{ul}}{{\mathit{\boldsymbol{f}}}_{l}}. \end{equation} | (7) |
Considering the high complexity of the matrix inversion in (7), the centralized SSL algorithms are not suitable for large-scale data sets. Thus, we focus on a particular algorithm belonging to distributed computing framework to apply DGSSL to large-scale data sets in this paper.
In the distributed SSL system described in Fig. 1, assume that the
\begin{equation} {\boldsymbol{W}}=\left[ \begin{matrix} {{{\boldsymbol{W}}}_{1}} & ? & ? \\ ? & \ddots & ? \\ ? & ? & {{{\boldsymbol{W}}}_{N}} \\ \end{matrix} \right]. \end{equation} | (8) |
As far as accuracy is concerned, the missing parts of the matrix
In this paper, instead of completing the weight matrix, we consider completing the global EDM and then using it to calculate the weight matrix. The Laplacian[12] and other kernel matrices for all kernel functions based on Euclidean distance can be obtained from the global EDM besides the Gaussian weight matrix.
The Euclidean distance matrix completion problem is formulated as follows. Consider a Euclidean distance matrix
\begin{equation} {{({\boldsymbol{D}})}_{ij}}=\left\| {{{\boldsymbol{x}}}_{i}}-{{{\boldsymbol{x}}}_{j}} \right\|_{2}^{2}, \end{equation} | (9) |
where
\begin{equation} \mathop {\min }\limits_{{\boldsymbol{D}} \in {\rm{EDM}}\left( {\Sigma {M_i}} \right)} \, \,\|{{P}_{\boldsymbol{\varOmega }}}(\widehat{{\boldsymbol{D}}}-{\boldsymbol{D}})\|_{F}^{2}, \end{equation} | (10) |
where
According to the Schoenberg map between the Euclidean distance matrix and semi-regularly deterministic matrix, the problem in (10) is reformulated as a semi-regular definite programming problem
\begin{align} &{\mathop{\min }_{{\boldsymbol{D}}}}\|{{P}_{{\bf{\Omega }}}}(\widehat{{\boldsymbol{D}}}-\chi ({\boldsymbol{D}}))\|_{F}^{2} \\ &\text{ s}\text{.t}\text{. }{\boldsymbol{D}}\succcurlyeq 0, \end{align} | (11) |
where the linear operator
A semi-positive definite matrix of rank
\begin{equation} \mathop {\min }\limits_{{V}{{V}^{\rm{T}}} \in {S_ + }\left( {r,\Sigma {M_i}} \right)} \,\| {{P}_{{\bf{\Omega }}}}\left( \widehat{{\boldsymbol{D}}}-\chi \left( {\boldsymbol{V}}{{{\boldsymbol{V}}}^{{\rm{T}}}} \right) \right)\|_{F}^{2}, \end{equation} | (12) |
where we have
\begin{equation} {{S}_+}\left( r,\sum M_i \right)=\left\{ {\boldsymbol{U}}\in {{\mathbb{R}}^{\sum M_i\times \sum M_i}}:{\boldsymbol{U}}={{{\boldsymbol{U}}}^{{\rm{T}}}}\succcurlyeq 0, \operatorname{rank}({\boldsymbol{U}})=r \right\}.\nonumber \end{equation} |
In matrix completion, solving the optimization problem has been studied for a single node. However, all algorithms that solve this problem for a single node have high computational overhead and therefore do not apply to large-scale data sets. Thus, we consider extending the optimization problem to a distributed computing system.
We propose a novel DGSSL algorithm under distributed computing framework, which consists of three main stages.
\begin{equation} \widehat{{\boldsymbol{D}}}:={{{\boldsymbol{D}}}_{1}}\oplus {{{\boldsymbol{D}}}_{2}}\oplus \cdots \oplus {{{\boldsymbol{D}}}_{N}}=\oplus _{i=1}^{N}{{{\boldsymbol{D}}}_{i}}, \end{equation} | (13) |
where
Matrix completion: Iteratively solve the EDM completion problem in (12). The iterative solution to (12) is given by
\begin{align} {{\nabla }_{{\boldsymbol{V}}}}F({\boldsymbol{V}})&={{\chi }^{*}}\left\{ {{P}_{{\bf{\Omega }}}}(\chi \left( {\boldsymbol{V}}[n]{{{\boldsymbol{V}}}^{{\rm{T}}}}[n] \right)-\widehat{{\boldsymbol{D}}}) \right\}{\boldsymbol{V}}[n], \end{align} | (14) |
\begin{align} &{\boldsymbol{V}}[n+1]={\boldsymbol{V}}[n]-\beta {{\nabla }_{{\boldsymbol{V}}}}F({\boldsymbol{V}}), \end{align} | (15) |
where
Label propagation: Iteratively solve the optimal problem for
\begin{align} \mathit{\boldsymbol{f}}_{u}^{t+1}&=\mathit{\boldsymbol{f}}_{u}^{t}-\alpha \nabla E(\mathit{\boldsymbol{f}}) =\\ &\mathit{\boldsymbol{f}}_{u}^{t}-2\alpha \left( {{{\boldsymbol{H}}}_{uu}}-{{{\boldsymbol{W}}}_{uu}} \right)\mathit{\boldsymbol{f}}_{u}^{t}-2\alpha{{{\boldsymbol{W}}}_{ul}}{{\mathit{\boldsymbol{f}}}_{l}}. \end{align} | (16) |
Concerning the distributed system, the algorithm can be completed with storage nodes executing computation tasks assigned by the server. The amount of computation for each iteration in (15) and (16) is mainly on matrix-vector or matrix-matrix multiplication. Thus, the partial sum can be calculated on different nodes, and then the results are obtained by adding all the partial sums on the central node. In this way, the proposed algorithm is based on the distributed computing framework. The distributed computation setting can be seen in Fig. 2.
Nevertheless, the system may be significantly affected by the noise of straggler nodes and system failures when multiplying the vectors or matrix of the distributed computing matrix. In each iteration, the central node has to wait for all nodes to return their results before it can recover the final result. Thus, the straggler node in the system determines the time of the algorithm. In addition, if a node fails in the system, it may lead to the failure of the entire algorithm. Given the above situations, we further develop a coded distributed SSL by adopting coding theory to solve the above problems.
We propose a coded distributed SSL algorithm based on the MDS code to make it robust to stragglers or system failure. In a distributed setting, each client can only calculate its distance matrix based on local training data, and the distance information between data at different clients is unknown. An approximate distance matrix is obtained by performing distributed matrix completion of the diagonal distance matrix of blocks received and restored by the central server. Therefore, the proposed algorithm consists of two-component algorithms.
Calculate the distance matrix
\begin{equation} {{\nabla }_{{\boldsymbol{V}}}}F({\boldsymbol{V}})={\boldsymbol{WV}}[n]-{\boldsymbol{TV}}[n], \end{equation} | (17) |
where we have
\begin{equation} {\boldsymbol{T}}:={{\chi }^{*}}({{P}_{{\boldsymbol{\Omega }}}}(\widehat{{\boldsymbol{D}}})),{\boldsymbol{W}}:={{\chi }^{*}}\left\{ {{P}_{{\boldsymbol{\Omega }}}}(\chi \left( {\boldsymbol{V}}[n]{{{\boldsymbol{V}}}^{{\rm{T}}}}[n] \right)) \right\}. \end{equation} | (18) |
Here we use
A single iteration of coded distributed matrix completion is shown in Fig. 3. The pseudocode of coded EDM completion is summarized in Algorithm 1, where the maximum number of iterations is denoted as
Algorithm 1: Coded matrix completion
Input: Incomplete global EDM matrix
Output: Optimal EDM matrix
1 Compute
2 Send
3 for
4 Multicast
5 for client
6 Compute
7 Compute
8 Send
9
10 while:
11 on Receiving message
12
13
14 Compute
15 Compute
16 Return
Optimize the distribution strategy for the label propagation algorithm and make it fault-tolerant to straggler nodes. We adopt coding technology to increase the fault tolerance of the algorithm to straggler as well. Define
A single iteration of coded label propagation is shown in Fig. 4. The pseudocode of the entire coded label propagation algorithm is summarized in Algorithm 2, where the largest number of iterations in the label propagation is denoted as
Algorithm 2: Coded label propagation
Input: Distance matrix
Output: Optimal unlabeled label
1 Compute
2 Compute
3 Send
4 for
5 Multicast
6
7 for client
8 Compute
9 Send
10 while:
11 on Receiving message
12
13
14 Compute
15 Return
From the previous discussion, it can be seen that our purpose is to overcome the drawback of long computation time and make the algorithm straggler tolerant. Coding parameter
In the coding algorithm, the parameter
In the matrix completion algorithm, the distance matrix is restored by its low-rank decomposition matrix, where the rank
We first consider the overall runtime of an uncoded distributed algorithm, assuming that the runtime of each task is the same and independent of each other. Then, we use
\begin{equation} {{T}_{\text{uncoded}}}=\max \{T_{\text{uncoded}}^{1},\cdots,T_{\text{uncoded}}^{N}\}. \end{equation} | (19) |
Then, we consider the overall runtime for a coded distributed algorithm.
\begin{equation} {{T}_{\text{MDS-coded}}}={{\min }_{{\cal{L}}}}{{\max }_{j\in {\cal{L}}}}T_{\text{coded}}^{j}. \end{equation} | (20) |
If the distribution of running time is subject to
Assuming that the distribution of run times follows a shifted exponential distribution. The model is driven by Ref. [26], which the authors use to model file queries and latencies for cloud storage systems. The distribution running time is given by
\begin{equation} \Pr \left( {{T}_{0}}\leqslant t \right)=1-{{{\rm{e}}}^{-\mu (t-1)}},t\geqslant 1, \end{equation} | (21) |
where
The average runtime of any distributed algorithm is bounded by
\begin{equation} \text{E}\left[ {{T}_{\text{uncoded}}} \right]=\frac{1}{N}\left( 1+\frac{1}{\mu }\log N \right)=\varTheta \left( \frac{\log N}{N} \right), \end{equation} | (22) |
\begin{equation} \text{E}\left[ {{T}_{\text{MDS-coded}}} \right]=\varTheta \left( \frac{1}{N} \right). \end{equation} | (23) |
Therefore, we can design the optimal
\begin{equation} {{k}^{*}}=\left[ 1+\frac{1}{{{W}_{-1}}\left( -{{{\rm{e}}}^{-\mu -1}} \right)} \right]N, \end{equation} | (24) |
where
In the matrix completion algorithm, the time required for each worker node in a single iteration to complete the matrix-matrix multiplication is
\begin{equation} \rho :=\frac{r}{K+2}. \end{equation} | (25) |
Therefore, the relationship between rank ratio
The distance matrix represents the topology inside the original data set in the SSL algorithm. The approximate distance matrix completion algorithm can also be well-achieved by algorithm accuracy because the recovered approximate distance matrix can represent the topology inside the data set. Assuming that given the initial distance matrix
\begin{equation} \gamma :=1-\frac{{{\left\| {\boldsymbol{D}}/\mu -{\boldsymbol{\tilde{D}}}/\nu \right\|}_{F}}}{{{\left\| {\boldsymbol{\tilde{D}}}/\nu \right\|}_{F}}}. \end{equation} | (26) |
Thus, we can select an appropriate
We conduct simulations to evaluate the performance of the CDGSSL framework. Our experiment is performed on Alibaba Cloud elastic compute service (ECS) using two different working instance types: n4.small and n4.large. Set n4.large as the central server and the clients are all ECS instances of n4.small. We set the simulation parameters of the component algorithms as follows unless specified otherwise. For the coded matrix completion algorithm, we set
In Fig. 5, we measure 1500 times the round-trip time on the server and plot a complementary CDF. Note that the round-trip time consists of computation and communication time for one iteration of the matrix-vector multiplication. We obtain an empirical distribution of task run time to observe the frequency of straggler nodes in our test bench and to design the optimal
The curve with the accuracy and rank ratio of the matrix completion distance algorithm ρ is shown in Fig. 6. The main purpose of obtaining this curve is to get the optimal rank ratio ρ*. We randomly generate 3000 training data ranging from 0 to 25. The training data is divided into groups by their dimension d. Moreover, we calculate its block diagonal distance matrix and complete it by the matrix distance completion algorithm. We can observe that the accuracy increases rapidly to relatively flat as the rank ratio reaches 0.4. This indicates that the increase in calculation time may somewhat be much more costly for increasing accuracy. We repeatedly measure the time required for matrix multiplication in the matrix completion algorithm on the client n4.small when
We observe 100 random images from the MNIST data set and their labels after the CDGSSL task in Fig. 7. From the data set, we extract images of the numbers 0 and 1 and randomly assign
In addition, we compare the proposed CDGSSL framework with the following three baseline schemes:
(Ⅰ) Local semi-supervised learning (LSSL). Each client only uses its local data set to perform the learning task of the label propagation algorithm. The client does not share information or communicate with other nodes in this scenario. Moreover, all the calculations are performed locally, which has the best privacy.
(Ⅱ) Centralized semi-supervised learning (CSSL). All data sets are concentrated on one node in the centralized semi-supervised learning scenario. Thus, the distance matrix can be computed by that node. It implements a label propagation algorithm on distance matrices without matrix completion or learning unknown labels. This baseline gives the best accuracy in graph-based learning scenarios.
(Ⅲ) Privacy-preserving semi-supervised learning (PSSL) in Ref. [22], the D-LapRLS algorithm. The clients collaborate to complete the distance matrix. Then the clients execute the label propagation algorithm through the global distance matrix. The D-LapRLS algorithm performs the best in DGSSL algorithms as far as accuracy is concerned.
Table 2 presents a comparison of the average classification accuracy of the CDGSSL framework and two baseline protocols. We conduct the simulations over two data sets including MNIST and CIFAR10. For the CDGSSL algorithm, we consider the average accuracy in three cases. Moreover, we set
Dataset | Protocol(s) | Labeled&Unlabeled data | \alpha | \beta | Rank(r) | Accuracy(%) |
MNIST | CSSL | 200&12665 | 1.0{\rm{E} }-03 | − | − | 95.99 |
CDGSSL | 200&12665 | 1.0{\rm{E} }-03 | 1.0{\rm{E} }-05 | 200 | 90.90 | |
CDGSSL | 200&12665 | 1.0{\rm{E} }-03 | 1.0{\rm{E} }-05 | 150 | 89.73 | |
CDGSSL | 200&12665 | 1.0{\rm{E} }-03 | 1.0{\rm{E} }-05 | 50 | 62.29 | |
LSSL | 40&2533 | 1.0{\rm{E} }-03 | − | − | 60.63 | |
CIFAR10 | CSSL | 400&10000 | 1.0{\rm{E} }-04 | − | − | 72.14 |
CDGSSL | 400&10000 | 1.0{\rm{E} }-04 | 1.0{\rm{E} }-05 | 3000 | 70.39 | |
CDGSSL | 400&10000 | 1.0{\rm{E} }-04 | 1.0{\rm{E} }-05 | 2000 | 67.45 | |
CDGSSL | 400&10000 | 1.0{\rm{E} }-04 | 1.0{\rm{E} }-05 | 1000 | 62.76 | |
LSSL | 80&2000 | 1.0{\rm{E} }-04 | − | − | 52.60 |
Table 3 compares the CDGSSL algorithm with three baseline schemes. We compare the accuracy and computational complexity overhead of four protocols. We can observe that the computational complexity of CDGSSL is much lower than that of PSSL in both matrix completion and label propagation, while the accuracy maintains a high level. The comparison indicates that the proposed algorithm reduces the computational complexity under the distributed computing framework and can apply to large-scale data sets.
Protocol | Storage | Matrix completion | Label propagation | Accuracy(%) |
CSSL | \varTheta (MN) | 0 | \varTheta (N{ {M}^{2} }) | 95.99 |
LSSL | \varTheta (M) | 0 | \varTheta ({ {M}^{2} }) | 60.63 |
PSSL | \varTheta (M) | \varTheta ({ {(NM)}^{2} }r) | \varTheta ({ {(NM)}^{2} }) | 93.28 |
CDGSSL | \varTheta (M) | \varTheta \left( {\dfrac{N{ {M}^{2} }r}{\log (N)} } \right) | \varTheta \left( {\dfrac{N{ {M}^{2} } }{\log (N)} } \right) | 90.90 |
The average execution time of the CDGSSL algorithm and the uncoded algorithm (in Section 3.2) is presented in Fig. 8. In this experiment, we artificially add latency to each iteration, setting a 4% probability that each client would become a straggler node. The delay function is added using time.sleep() function when the client is selected to be a straggler node. We set
We have proposed the CDGSSL for binary classification problems in this work. The proposed algorithm has overcome the drawbacks of inefficient graph construction and straggler problem. We first provide a parallel and distributed solution of matrix completion for efficient graph construction. Then, we proposed CDGSSL based on MDS code to further tackle the straggler problem. Moreover, we have provided optimal parameters designed to improve the performance of CDGSSL. Finally, numerical experiments on Alibaba Cloud ECS have been made with the MNIST dataset. Simulation results have demonstrated the effectiveness of our proposed algorithm in alleviating the straggler effect by up to 33%, and attaining lower computational complexity, compared with existing methods.
This work was supported by the National Key R&D Program of China (2021YFB2900302).
The authors declare that they have no conflict of interest.
[1] |
van Engelen J E, Hoos H H. A survey on semi-supervised learning. Machine Learning, 2020, 109: 373–440. DOI: 10.1007/s10994-019-05855-6
|
[2] |
Ang J C, Mirzal A, Haron H, et al. Supervised, unsupervised, and semi-supervised feature selection: A review on gene selection. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2016, 13 (5): 971–989. DOI: 10.1109/tcbb.2015.2478454
|
[3] |
Zhu X, Goldberg A B. Introduction to Semi-Supervised Learning. Cham, Switzerland: Springer, 2009.
|
[4] |
Scudder H. Probability of error of some adaptive pattern-recognition machines. IEEE Transactions on Information Theory, 1965, 11 (3): 363–371. DOI: 10.1109/tit.1965.1053799
|
[5] |
Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. In: COLT' 98: Proceedings of the Eleventh Annual Conference on Computational Learning Theory. New York: ACM, 1998: 92–100.
|
[6] |
Belkin M, Niyogi P, Sindhwani V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 2006, 7: 2399–2434. DOI: 10.5555/1248547.1248632
|
[7] |
Belkin M, Niyogi P. Semi-supervised learning on Riemannian manifolds. Machine Learning, 2004, 56: 209–239. DOI: 10.1023/B:MACH.0000033120.25363.1e
|
[8] |
Chapelle O, Schölkopf B, Zien A, Transductive support vector machines. In: Semi-Supervised Learning. Cambridge: MIT Press. 2006, 105–117.
|
[9] |
Chong Y, Ding Y, Yan Q, et al. Graph-based semi-supervised learning: A review. Neurocomputing, 2020, 408 (30): 216–230. DOI: 10.1016/j.neucom.2019.12.130
|
[10] |
Zhu X, Ghahramani Z, Lafferty J. Semi-supervised learning using Gaussian fields and harmonic functions. In: ICML'03: Proceedings of the Twentieth International Conference on International Conference on Machine Learning. Washington, DC: AAAI Press, 2003: 912–919.
|
[11] |
Zhou D, Bousquet O, Lal T N, et al. Learning with local and global consistency. In: NIPS'03: Proceedings of the 16th International Conference on Neural Information Processing Systems. New York: ACM, 2003: 321–328.
|
[12] |
Chen J, Wang C, Sun Y, et al. Semi-supervised Laplacian regularized least squares algorithm for localization in wireless sensor networks. Computer Networks, 2011, 55 (10): 2481–2491. DOI: 10.1016/j.comnet.2011.04.010
|
[13] |
Szummer M, Jaakkola T. Partially labeled classification with Markov random walks. In: NIPS'01: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. Cambridge: MIT Press, 2001: 945–952.
|
[14] |
Grira N, Crucianu M, Boujemaa N. Active semi-supervised fuzzy clustering for image database categorization. In: MIR '05: Proceedings of the 7th ACM SIGMM International Workshop on Multimedia Information Retrieval. New York: ACM, 2005: 9–16.
|
[15] |
Chapelle O, Zien A. Semi-supervised classification by low density separation. In: AISTATS 2005–Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics. Stuttgart, Germany: Max-Planck-Gesellschaft, 2005: 57–64.
|
[16] |
Kostopoulos G, Karlos S, Kotsiantis S, et al. Semi-supervised regression: A recent review. Journal of Intelligent & Fuzzy Systems, 2018, 35 (2): 1483–1500. DOI: 10.3233/JIFS-169689
|
[17] |
Torii M, Wagholikar K, Liu H. Using machine learning for concept extraction on clinical documents from multiple data sources. Journal of the American Medical Informatics Association, 2011, 18: 580–587 . DOI: 10.1136/amiajnl-2011-000155
|
[18] |
Scardapane S, Fierimonte R, Wang D, et al. Distributed music classification using random vector functional-link nets. In: 2015 International Joint Conference on Neural Networks (IJCNN). Killarney, Ireland: IEEE, 2015: 1–8.
|
[19] |
Shih T K, Distributed multimedia databases In: Shih T K, editor. Distributed Multimedia Databases: Techniques and Applications. Hershey, PA: IGI Global, 2002: 2–12.
|
[20] |
Shen P, Du X, Li C. Distributed semi-supervised metric learning. IEEE Access, 2016, 4: 8558–8571. DOI: 10.1109/ACCESS.2016.2632158
|
[21] |
Scardapane S, Fierimonte R, Di Lorenzo P, et al. Distributed semi-supervised support vector machines. Neural Networks, 2016, 80: 43–52 . DOI: 10.1016/j.neunet.2016.04.007
|
[22] |
Fierimonte R, Scardapane S, Uncini A, et al. Fully decentralized semi-supervised learning via privacy-preserving matrix completion. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28: 2699–2711. DOI: 10.1109/TNNLS.2016.2597444
|
[23] |
Gan H, Li Z, Wu W, et al. Safety-aware graph-based semi-supervised learning. Expert Systems With Applications, 2018, 107: 243–254. DOI: 10.1016/j.eswa.2018.04.031
|
[24] |
Lee K, Lam M, Pedarsani R, et al. Speeding up distributed machine learning using codes. IEEE Transactions on Information Theory, 2018, 64 (3): 1514–1529. DOI: 10.1109/tit.2017.2736066
|
[25] |
Chen L, Han K, Du Y, et al. Block-division-based wireless coded computation. IEEE Wireless Communications Letters, 2022, 11 (2): 283–287. DOI: 10.1109/LWC.2021.3125983
|
[26] |
Agarwal A, Duchi J C. Distributed delayed stochastic optimization. In: 2012 IEEE 51st IEEE Conference on Decision and Control (CDC). Maui, USA: IEEE, 2012: 5451–5452.
|
[27] |
Alfakih A Y, Khandani A K, Wolkowicz H. Solving euclidean distance matrix completion problems via semidefinite programming. Computational Optimization and Applications, 1999, 12: 13–30. DOI: 10.1023/A:1008655427845
|
[28] |
Al-Homidan S, Wolkowicz H. Approximate and exact completion problems for Euclidean distance matrices using semidefinite programming. Linear Algebra and Its Applications, 2005, 406: 109–141. DOI: 10.1016/j.laa.2005.03.021
|
[29] |
Liu W, Chen L, Zhang W. Decentralized federated learning: Balancing communication and computing costs. IEEE Transactions on Signal and Information Processing Over Networks, 2022, 8: 131–143. DOI: 10.1109/TSIPN.2022.3151242
|
[30] |
Liu W, Chen L, Chen Y, et al. Accelerating federated learning via momentum gradient descent. IEEE Transactions on Parallel and Distributed Systems, 2020, 31 (8): 1754–1766. DOI: 10.1109/TPDS.2020.2975189
|
[31] |
Wang Z, Du Y, Wei K, et al. Vision, application scenarios, and key technology trends for 6G mobile communications. Science China Information Sciences, 2022, 65: 151301. DOI: 10.1007/s11432-021-3351-5
|
Figure
1.
Illustration of the distributed SSL setup with N client and 1 server. Each client owns a labeled data set
[1] |
van Engelen J E, Hoos H H. A survey on semi-supervised learning. Machine Learning, 2020, 109: 373–440. DOI: 10.1007/s10994-019-05855-6
|
[2] |
Ang J C, Mirzal A, Haron H, et al. Supervised, unsupervised, and semi-supervised feature selection: A review on gene selection. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2016, 13 (5): 971–989. DOI: 10.1109/tcbb.2015.2478454
|
[3] |
Zhu X, Goldberg A B. Introduction to Semi-Supervised Learning. Cham, Switzerland: Springer, 2009.
|
[4] |
Scudder H. Probability of error of some adaptive pattern-recognition machines. IEEE Transactions on Information Theory, 1965, 11 (3): 363–371. DOI: 10.1109/tit.1965.1053799
|
[5] |
Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. In: COLT' 98: Proceedings of the Eleventh Annual Conference on Computational Learning Theory. New York: ACM, 1998: 92–100.
|
[6] |
Belkin M, Niyogi P, Sindhwani V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 2006, 7: 2399–2434. DOI: 10.5555/1248547.1248632
|
[7] |
Belkin M, Niyogi P. Semi-supervised learning on Riemannian manifolds. Machine Learning, 2004, 56: 209–239. DOI: 10.1023/B:MACH.0000033120.25363.1e
|
[8] |
Chapelle O, Schölkopf B, Zien A, Transductive support vector machines. In: Semi-Supervised Learning. Cambridge: MIT Press. 2006, 105–117.
|
[9] |
Chong Y, Ding Y, Yan Q, et al. Graph-based semi-supervised learning: A review. Neurocomputing, 2020, 408 (30): 216–230. DOI: 10.1016/j.neucom.2019.12.130
|
[10] |
Zhu X, Ghahramani Z, Lafferty J. Semi-supervised learning using Gaussian fields and harmonic functions. In: ICML'03: Proceedings of the Twentieth International Conference on International Conference on Machine Learning. Washington, DC: AAAI Press, 2003: 912–919.
|
[11] |
Zhou D, Bousquet O, Lal T N, et al. Learning with local and global consistency. In: NIPS'03: Proceedings of the 16th International Conference on Neural Information Processing Systems. New York: ACM, 2003: 321–328.
|
[12] |
Chen J, Wang C, Sun Y, et al. Semi-supervised Laplacian regularized least squares algorithm for localization in wireless sensor networks. Computer Networks, 2011, 55 (10): 2481–2491. DOI: 10.1016/j.comnet.2011.04.010
|
[13] |
Szummer M, Jaakkola T. Partially labeled classification with Markov random walks. In: NIPS'01: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. Cambridge: MIT Press, 2001: 945–952.
|
[14] |
Grira N, Crucianu M, Boujemaa N. Active semi-supervised fuzzy clustering for image database categorization. In: MIR '05: Proceedings of the 7th ACM SIGMM International Workshop on Multimedia Information Retrieval. New York: ACM, 2005: 9–16.
|
[15] |
Chapelle O, Zien A. Semi-supervised classification by low density separation. In: AISTATS 2005–Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics. Stuttgart, Germany: Max-Planck-Gesellschaft, 2005: 57–64.
|
[16] |
Kostopoulos G, Karlos S, Kotsiantis S, et al. Semi-supervised regression: A recent review. Journal of Intelligent & Fuzzy Systems, 2018, 35 (2): 1483–1500. DOI: 10.3233/JIFS-169689
|
[17] |
Torii M, Wagholikar K, Liu H. Using machine learning for concept extraction on clinical documents from multiple data sources. Journal of the American Medical Informatics Association, 2011, 18: 580–587 . DOI: 10.1136/amiajnl-2011-000155
|
[18] |
Scardapane S, Fierimonte R, Wang D, et al. Distributed music classification using random vector functional-link nets. In: 2015 International Joint Conference on Neural Networks (IJCNN). Killarney, Ireland: IEEE, 2015: 1–8.
|
[19] |
Shih T K, Distributed multimedia databases In: Shih T K, editor. Distributed Multimedia Databases: Techniques and Applications. Hershey, PA: IGI Global, 2002: 2–12.
|
[20] |
Shen P, Du X, Li C. Distributed semi-supervised metric learning. IEEE Access, 2016, 4: 8558–8571. DOI: 10.1109/ACCESS.2016.2632158
|
[21] |
Scardapane S, Fierimonte R, Di Lorenzo P, et al. Distributed semi-supervised support vector machines. Neural Networks, 2016, 80: 43–52 . DOI: 10.1016/j.neunet.2016.04.007
|
[22] |
Fierimonte R, Scardapane S, Uncini A, et al. Fully decentralized semi-supervised learning via privacy-preserving matrix completion. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28: 2699–2711. DOI: 10.1109/TNNLS.2016.2597444
|
[23] |
Gan H, Li Z, Wu W, et al. Safety-aware graph-based semi-supervised learning. Expert Systems With Applications, 2018, 107: 243–254. DOI: 10.1016/j.eswa.2018.04.031
|
[24] |
Lee K, Lam M, Pedarsani R, et al. Speeding up distributed machine learning using codes. IEEE Transactions on Information Theory, 2018, 64 (3): 1514–1529. DOI: 10.1109/tit.2017.2736066
|
[25] |
Chen L, Han K, Du Y, et al. Block-division-based wireless coded computation. IEEE Wireless Communications Letters, 2022, 11 (2): 283–287. DOI: 10.1109/LWC.2021.3125983
|
[26] |
Agarwal A, Duchi J C. Distributed delayed stochastic optimization. In: 2012 IEEE 51st IEEE Conference on Decision and Control (CDC). Maui, USA: IEEE, 2012: 5451–5452.
|
[27] |
Alfakih A Y, Khandani A K, Wolkowicz H. Solving euclidean distance matrix completion problems via semidefinite programming. Computational Optimization and Applications, 1999, 12: 13–30. DOI: 10.1023/A:1008655427845
|
[28] |
Al-Homidan S, Wolkowicz H. Approximate and exact completion problems for Euclidean distance matrices using semidefinite programming. Linear Algebra and Its Applications, 2005, 406: 109–141. DOI: 10.1016/j.laa.2005.03.021
|
[29] |
Liu W, Chen L, Zhang W. Decentralized federated learning: Balancing communication and computing costs. IEEE Transactions on Signal and Information Processing Over Networks, 2022, 8: 131–143. DOI: 10.1109/TSIPN.2022.3151242
|
[30] |
Liu W, Chen L, Chen Y, et al. Accelerating federated learning via momentum gradient descent. IEEE Transactions on Parallel and Distributed Systems, 2020, 31 (8): 1754–1766. DOI: 10.1109/TPDS.2020.2975189
|
[31] |
Wang Z, Du Y, Wei K, et al. Vision, application scenarios, and key technology trends for 6G mobile communications. Science China Information Sciences, 2022, 65: 151301. DOI: 10.1007/s11432-021-3351-5
|