Ruoxin Liu is currently a Master student in the Energy and Heat Transfer Laboratory of the Department of Thermal Science and Energy Engineering under the supervision of Prof. Wenlong Cheng at University of Science and Technology of China. His research mainly focuses on high heat flux heat dissipation and related research on surfactants
Wenglong Cheng received his PhD degree in engineering thermophysics from University of Science and Technology of China in 2002 and is currently a professor of University of Science and Technology of China. His research interests include high heat flux heat dissipation and heat and mass transfer enhancement, thermal control and thermal management, thermal analysis of complex systems, and energy conversion and advanced power systems
The spray cold plate has a compact structure and high-efficiency heat exchange, which can meet the requirements of high heat flux dissipation of multiple heat sources, and is a reliable means to solve the heat dissipation of the next generation of chips. This paper proposes to use surfactants to enhance the heat transfer of the spray cold plate, and conduct a systematic experimental study on the heat transfer performance of the spray cold plate under different types and concentrations of additives. It was found that among the three surfactants, sodium dodecyl sulfate (SDS) can improve the heat transfer performance of the spray cold plate, and at the optimal concentration of 200ppm, the heat transfer coefficient of the spray cold plate was increased significantly by 19.8%. Both the n-octanol-distilled water and Tween 20-distilled water can reduce the heat transfer performance of the cold plate using multi nozzles. In addition, based on the experimental data, the dimensionless heat transfers correlations for the spray cold plate using additives were conducted, and the maximum errors of dimensionless correlations for using additives were 2.1%, 2.8%, and 5.4% respectively. This discovery provides a theoretical analysis and basis for the improvement of spray cold plates.
Graphical Abstract
Adding 200 ppm of sodium dodecyl sulfate (SDS) can increase the heat transfer coefficient of the spray cold plate by 19.8%
Abstract
The spray cold plate has a compact structure and high-efficiency heat exchange, which can meet the requirements of high heat flux dissipation of multiple heat sources, and is a reliable means to solve the heat dissipation of the next generation of chips. This paper proposes to use surfactants to enhance the heat transfer of the spray cold plate, and conduct a systematic experimental study on the heat transfer performance of the spray cold plate under different types and concentrations of additives. It was found that among the three surfactants, sodium dodecyl sulfate (SDS) can improve the heat transfer performance of the spray cold plate, and at the optimal concentration of 200ppm, the heat transfer coefficient of the spray cold plate was increased significantly by 19.8%. Both the n-octanol-distilled water and Tween 20-distilled water can reduce the heat transfer performance of the cold plate using multi nozzles. In addition, based on the experimental data, the dimensionless heat transfers correlations for the spray cold plate using additives were conducted, and the maximum errors of dimensionless correlations for using additives were 2.1%, 2.8%, and 5.4% respectively. This discovery provides a theoretical analysis and basis for the improvement of spray cold plates.
Public Summary
The spray cold plate can meet the heat dissipation needs of compact and multiple heat sources.
The strengthening effect of additives under the condition of spray cold plate is greatly weakened, but there are still some concentrations of additives that have positive effects. For example, adding 200 ppm of SDS can increase the heat transfer coefficient of the spray cold plate by 19.8%.
The mechanism of additive strengthening was studied and the heat transfer correlation formula of additive acting on spray cold plate was obtained. The maximum errors were 2.1%, 2.8% and 5.4% respectively.
Deep learning has seen considerable growth in the past decade [1-4]. State-of-the-art deep learning models such as LeNet [5], VGG [6], GoogLeNet [7], ResNet [8], and EfficientNet [9] have achieved superior performance in computer vision applications, such as object recognition[10], face recognition [11], and image classification [12-13]. In addition, numerous deep learning frameworks have been released to help engineers and researchers develop systems based on deep learning easily or conduct research.
Although these frameworks allow for easy deployment of neural networks in real-world applications, training neural network models is still a daunting task as it requires a considerable amount of data and computational resources. Therefore, pretrained models are provided on the websites to facilitate the reproduction of results of a study. Currently, the sharing of well-trained models is essential for both the research and development of deep neural network systems. Numerous pretrained models have been uploaded by developers on websites such as PyTorch Hub① and Papers With Code②.
Owing to the significant progress made in deep learning, neural networks are now being used as steganography cover media. Song et al. made the first attempt to embed messages in neural network parameters [14] and proposed three methods: The least significant bits(LSB) encoding, which simply embeds message in the lower bits of the parameters of well-trained models; Correlated Value Encoding, which forces the parameters to be highly correlated with the embedded message by means of malicious regular terms; and Sign Encoding, which encodes message in the signs of parameters.
Consequently, malicious developers can use neural networks to exchange messages imperceptibly. Liu et al. proposed StegoNet[15], which turns a deep learning model into a stegomalware by using model parameters as a payload injection channel. There is no significant decrease in accuracy with StegoNet, and the triggers are connected to the physical world by input specification. StegoNet focuses on embedding methods that are effective both on uncompressed and deeply compressed models. Deeply compressed models, such as VGG-16 Compressed and AlexNet Compressed [16] shrink in size by reducing both the amount and data precision of general uncompressed model parameters. Liu et al. showed that, similar to image steganography, LSB substitution in deeply compressed models will lead to significant bias in the statistics of the parameters. Therefore, LSB substitution in a deep compression model is easily detected by traditional image steganalysis methods such as primary sets [17], sample pairs [18], Chi Squares [19] and an RS analysis [20-21], while the LSB substitution on uncompressed models is challenging to detect. For these scenarios, Liu et al. proposed three improved embedding methods based on LSB substitution: Resilience training, value-mapping and sign-mapping. However, the approaches proposed by Liu et al. mainly focus on selecting the position with the minimum distortion to embed the secret message. Therefore, the message embedded by these approaches rely on a considerable amount of auxiliary information such as positions stored in other parts of the stegomalware to ensure extraction. In contrast, the approaches proposed by Song et al. are more universal and can be leveraged to harm society.
Details about models can be easily obtained because developers always share the model settings, such as the structure and dataset, when they publish the model. However, as the training process almost randomly initializes the parameters, it is still impossible to determine the cover model even with the information developers provide. Therefore, it is necessary to design a specific model detection method. In this paper, we demonstrate that the methods proposed by Song et al. [14] which can modify the statistical distribution of a model. An analysis of LSB Encoding revealed that the randomness of bits systematically increases during the embedding of the secret message; therefore, we can build discriminative features by measuring the randomness of the bit plane. Our analyses reveals that, with correlated value encoding and sign encoding, the distribution of parameters varies during embedding; therefore, we can determine statistical features of parameters to measure the differences between the varying distributions. Thereafter, a regression is utilized to capture the bias of the statistic feature for classification. The experimental results reveal that our methods are effective for detecting the model with an embedded message even when the payload of the stego model is low.
The remainder of the paper is structured as follows. Section 2 reviews the related work on neural networks and steganography of neural networks. In Section 3, we propose methods for detecting the presence of steganography in a given model. In Section 4, we present the results of experiments that validate the effectiveness of our proposed methods. Section 5 concludes this paper and discusses future research directions.
2.
Preliminary and prior work
2.1
Neural network
Machine learning can be divided into supervised learning and unsupervised learning. Although we focus on supervised learning in this paper, our approaches can be applied to unsupervised learning as well. Let X be the input space where xi∈Rd is the i-th instance, and Y be the output space where yi∈{1,⋯,K} is the true label of the i-th instance and K≥2 is the number of classes. Given a set of data points D=(xi,yi)ni=1, where (xi,yi)∈X×Y, this set is partitioned into two subsets, training data Dtrain and testing data Dtest. A machine learning model is a function fθ:X→Y parameterized by parameters θ. When it is difficult to clearly state how a function should be calculated, deep neural networks are most often used. In deep neural networks, f is composed of layers of nonlinear transformations that map the input to a series of intermediate states and then to the output. The parameter θ is the weight used in each transformation. As the depth of the network increases, the number of parameters increases.
In order to find the optimal set of parameters θ for function f, the objective function, which penalizes the mismatch between the true label yi and the predicted label generated by fθ(xi), is minimized. Empirical risk minimization is a general framework, that uses the following objective function over Dtrain:
min
(1)
where L is the cross entropy loss function, and \Omega ({\boldsymbol{\theta}}) is the regularization term that prevents models from overfitting.
In the general deep neural network model, the parameters are 32 -bit floating point numbers, following IEEE standard 754 [22]. According to IEEE standard 754, floating-point data should be represented in the form (-1)^s \times 2^e \times t , where s is the sign, e is the biased exponent which can be computed by subtracting 127 , and t is the trailing significant field which represents fractions with a non-zero leading bit. A representation of 32 -bit floating-point data in the binary interchange format is given in Fig.1. In this example, the float32 number is 0.75 = (-1)^0 \cdot (2^{-1}) \cdot (1+2^{-1}) , where s = 0, e = 2^{126-127} ( 127 is the bias), t = 2^{-1} . For steganography methods embedded in well-trained models that focus on the binary form of the parameters, the secret message is embedded directly in lower bit planes. For steganography methods embedded during training that focus on the decimal form of the parameters, the secret message is mapped into the values or the signs of the parameters.
Figure
1.
An example of binary interchange formats for a float 32 number, where the green part is the sign, the yellow part is the biased exponent, and the red part is the trailing significant field.
Neural network steganography(NNS) is a timely issue, as neural networks are extremely redundant and are widely spread over the Internet. Malicious developers publish their white-boxed models with detailed structures and parameters, and receivers decode the secret message from the parameters. Such a scenario can appear on a third-party platform where people can publish and use the models. Even if the platform is secure, the model supplied by the developer may not be trustworthy. The security of the model is often overlooked by platforms and users.
NNS was proposed by Song et al. [14] to embed messages in uncompressed models and developed by Liu et al. [15] to embed secret messages on both compressed and uncompressed models. The approaches proposed by Liu et al.synthesize the deep learning models with stegomalware [23]. These methods shrink the size of the compressed model by reducing the number and precision of the model parameters. However, to maintain the accuracy of the model, the secret message that can be embedded must be short; furthermore, a significant amount of auxiliary information must be extracted successfully.
Although the approaches proposed by Liu et al. and Song et al. use the same logic, those proposed by Song et al. focus more on uncompressed neural networks. As the methods proposed by Liu et al. have more limitations and require more auxiliary information for extraction than those proposed by Song et al., we focus on the steganalysis of uncompressed neural networks; specifically, we focus on the approaches proposed by Song et al. Song et al. [14] proposed three techniques: LSB encoding, correlated value encoding (COR) and sign encoding (SGN).
LSB encoding: In this method, the secret message is embedded directly in the least significant (lower) bits of the model parameters. First, malicious developers train a benign model. Then they post-process the model parameters {\boldsymbol{\theta}} by replacing the lower bits of the parameters with the secret message {\boldsymbol{m}} , producing modified parameters {\boldsymbol{\theta}}' . At the receiver-end, the receiver extracts the message by reading the lower bits of the parameters and interpret them as bits of the secret message.
Correlated value encoding: This approach gradually embeds the secret message during training. The secret message is embedded by forcing parameters to be highly correlated with the secret message. In detail, a malicious correlation term C is added to the loss function, where
In the above expression, \lambda_c controls the level of correlation, {\boldsymbol{m}} is the secret message with length l , \bar{\theta} and \bar{m} are the mean value of parameters {\boldsymbol{\theta}} and secret message {\boldsymbol{m}} , respectively. During training, the malicious term drives the gradient direction towards a local minimum where the secret and the parameters are highly correlated. Therefore, the larger \lambda_c , the more correlated {\boldsymbol{\theta}} and {\boldsymbol{m}} . Recovering the secret message from the model only requires mapping parameters back to the feature space because correlated parameters are approximately linear transformation of the secret message.
Sign encoding: Sign encoding is another method that can be used to encode a secret message as a model is trained. Similar to correlated value encoding, Sign encoding also adds a malicious correlation term to the loss function. However, the secret message {\boldsymbol{m}} is embedded in the signs of the parameters. In detail, a positive parameter represents 1 and a negative parameter represents 0 . The malicious correlation term P is defined as
In the above expression, \lambda_s controls the level of correlation, and l is the length of the secret message {\boldsymbol{m}} . Recovering the secret message from the model only requires reading the signs of the parameters and then interpreting them as bits of the secret message.
Unlike in LSB encoding, in correlated value encoding and sign encoding, the secret message cannot be decoded completely and correctly. However, correlated value encoding and sign encoding have better robustness against fine-tuning.
3.
Proposed methods
Even though NNS methods [14] can embed secret messages without performance degradation, biases will occur ineluctably in the model statistics. Moreover, it is easy to master the information of the model as developers need to specify the setting of their model such as the model structure and dataset when the model is released. With the mentioned information, we can build benign and malicious models, and train classifiers to detect the malicious models. In this section, we first present the detection framework for steganalysis of NNS. Then, we describe the model distribution bias generated by each method and design effective features for detection.
3.1
Framework of neural network steganalysis
Different biases are caused by the steganography methods proposed by Ref. [14]. For LSB encoding, the randomness of the benign bit plane is different from that of the malicious bit plane. For correlated value encoding and sign encoding, the distribution of parameters in each layer of the benign model differs from that of the malicious model. However, as the steganography method applied by the malicious developer is non-transparent, we design an overall framework of our neural network steganalysis to facilitate comprehensive detection. Fig. 2 illustrates the overall framework of our proposed steganalysis methods.
Figure
2.
Illustration of the framework we proposed for the neural network steganalysis.
(Ⅰ) Feature extraction: Since the NNS method used by the malicious developer is unknown, we extract features from the injected model for each NNS method to ensure comprehensive detection. For LSB encoding, we capture the distribution of bit plane as features. For correlated value encoding and sign encoding, we capture the distribution of parameters as features. All of the syntaxes of the features in this paper follow the convention \rm{name} = \{ \phi \}_{\{ \# \}}, where \phi represents the feature and \# is the sequence number. These features captured by the i -th steganalysis method jointly constitute the multidimensional feature vector \Phi_i , where i \in \{ 1, 2, 3 \} , the 1 -th method is LSB encoding steganalysis, the 2 -th method is Correlated Value Encoding Steganalysis and the 3 -th method is sign encoding steganalysis.
(Ⅱ) Classification: Essentially, steganalysis is a binary classification task. We use logistic regression[24-26] as our classifier. Let {\cal{X}} = \{{\boldsymbol{x}}_1,..., {\boldsymbol{x}}_n\} be the input space where {\boldsymbol{x}}_i is the feature of the i -th model extracted by one of the steganalysis methods; {\cal{Y}} = \{ y_1,...,y_n \} is the output space, where y_i is the true label of the i -th model; {\boldsymbol{w}} is the parameter of classifier and n is the number of models used to train the classifier. The objective function of our classifier is
where \epsilon denotes the coefficient of the regularization term. The loss function {L_{\rm{log}}}(y_i, g({\boldsymbol{x}}_i;{\boldsymbol{w}})) is defined as
The ensemble reaches its decision by fusing all the decisions of the subclassifiers using a voting process. The ensemble judges the injected model as benign only when all the subclassifers consider it a cover model.
3.2
Feature extraction
As these steganography methods can lead to biases, it is clear that the artifacts they generate can be identified by effective features. In this subsection, we will describe feature extraction in detail.
Multiple features can be used to identify bias. For each steganalysis of NNS, we evaluate the performance of each feature to identify whether the feature is an optimal feature. We then obtain a set of optimal features. For each feature, we train a logistic regression classifier and obtain the detection accuracy. For LSB encoding, ResNet34[8] trained on CIFAR10[27] is used to evaluate the performance of each feature, and we train 100 benign models and 100 malicious models with a payload of 1.0 bits per parameter. For correlated valueencoding and sign encoding, ResNet34[8], VGG16[6],and EfficientNetB0[9] trained on CIFAR10 are used to evaluate the performance of each feature, and for each network, we train 100 benign models and 100 malicious models with payloads of 0.2, 0.6 and 1.0 bits per parameter.
(Ⅰ) Steganalysis of LSB encoding: For security purposes, a secret message is always encrypted before it is embedded. Thus, the secret message can be regarded as a random binary message. For a high performance to be achieved with LSB encoding, high-precision parameters are not required; therefore, modifying the lower bits of the parameters will not lead to significant performance degradation. However, the randomness of benign bit planes is less than that of malicious bit planes. For example, in the binary secret message,0 and 1 are uniformly distributed, while a bias exists for the distribution of 0 and 1 in the benign bit plane. Therefore, we propose our steganalysis algorithm by detecting such biases in the distribution of bit plane.
The bias detection in the bit plane distribution can be implemented by randomness test. Thus far, numerous studies on randomness detection have been proposed[28-31], and the NIST statistical test suite[29] is one of the most representative methods. Therefore, in our method, we design our steganalysis features referring to the NIST statistical test suite, which includes 14 test statistics.
For an injected model, the lowest bit plane with a tendency to be modified is extracted and detected for randomness. As the secret message length varies, not all parameters will be changed. As we detect the bias in the bit plane distribution, while varying the message length, we focus on the change in the payload of the specified bit plane.
To find the optimal set of features, we use the 14 test statistics of the NIST statistical test suite to train the subclassifiers. For each subclassifier, we test the average detection accuracy over the bit planes from the 14th bit plane to the 18th bit plane of ResNet34 trained on CIFAR10 for a payload of 1.0 bits per parameter. Fig. 3 shows the averaged accuracy of each subclassifier. We select the statistics with the top 4 highest accuracies as features. They are the statistics of the frequency test, serial test, approximate entropy test, and cumulative sums test, and are, denoted by \phi_1 , \phi_2 , \phi_3 and \phi_4 , respectively.
Figure
3.
Averaged accuracy over the bit planes from the 14th to 18th bit plane of ResNet34 trained on CIFAR10 with a payload of 1.0 bit per parameter for each subclassifier.
\phi_1 evaluates whether the proportion of 0 and 1 in a sequence is similar to that in a random sequence. For a binary secret message, the number of 0 and 1 in the sequence should be about the same. However, for a sequence extracted from benign bit planes, there is a bias in the proportion of 0 and 1 . In this situation, we convert the 0 in the sequence to -1 and compute the sums {\rm{Sum}}(n) of the sequence with length n . Therefore, \phi_1 is computed as
\phi_2 and \phi_3 focus on the frequency of all possible overlapping blocks across the sequence. \phi_2 evaluates whether the number of occurrences of the overlapping patterns is approximately the same as the expected number of random sequence. We count the frequency of the i -th p -bit blocks as v_{i_1...i_{p}} and the frequency of the i -th (p-1) -bit blocks as v_{i_1, ..., i_{p-1}}, where i_1,...,i_{p} and i_1,...,i_{p-1} are the p -bit pattern and (p-1) -bit pattern, respectively. \phi_2 is computed as
\phi_3 uses the approximate entropy to compare the frequency of overlapping blocks of two consecutive lengths ( p and p+1 ) against the expected result for a random sequence. Let the count of the possible p -bit values be described as
C^p_i = \dfrac{\#i}{n}
(9)
where i represents a p -bit value, \#i represents the calculated count of the value i , and n represents the length of the sequence. The approximate entropy presents the difference in frequency between the p -bit overlapping blocks and the (p+1) -bit overlapping blocks. The approximate entropy (ApEn) is computed as
{\rm{ApEn}}(p) = \varphi^{(p)}-\varphi^{(p+1)}
(10)
\varphi^{(p)} is the entropy of the empirical distribution arising on the set of all 2^p possible patterns of length p , Thus
\phi_3 measure the match between the observed value of {\rm{ApEn}}(p) and the expected value, Thus
\phi_3 = 2n\left[ \log2-{\rm{ApEn}}(p) \right]
(12)
\phi_4 evaluates whether the cumulative sum of the partial sequences is approximately the same as the expected cumulative sum of random sequences. We convert 0 in the sequence to -1 and compute the sums {\rm{Sum}}(i) of successively larger subsequences with length i starting from the beginning:
{\rm{Sum}}(i) = {\rm{Sum}}(i-1)+X_i, i = 2,...,n
(13)
where X_i is the i -th value of the converted sequence, {\rm{Sum}}(1) = X_1, and n is the length of the sequence. We use the largest excursion from the origin of the cumulative sums as feature \phi_4 , thus
\phi_4 = {\rm{max}}_{1 \le k \le n}|{\rm{Sum}}(k)|
(14)
The total feature vector of LSB encoding steganalysis is \Phi_1 = \{\phi_1, \phi_2, \phi_3, \phi_4 \} .
(Ⅱ) Steganalysis of correlated value encoding: Correlated value encoding constrains the value range of the parameter by the malicious correlation term. In this situation, a trade-off between the embedding degree and the model accuracy is achieved during training. We observe that the parameters of the malicious model present a different distribution from those of the benign model owing to the constraints on the parameter values. Therefore, the deviation in the distribution of the parameters is used to detect the correlated value encoding in our steganalysis algorithm.
To detect the bias caused by correlated value encoding, we use moments as our features to describe the shape of the distribution. The most commonly used moments are first-order, second-order, third-order and fourth-order moments. Under this situation, we design our steganalysis features by referring to these moments. As the distribution of parameters in different layers may vary in a neural network, a more detailed description of the parameter distribution is necessary. We describe the distribution of the parameters in each layer.
{\boldsymbol{\phi}_{\bf{5}}} is expectation, which is used to measure the average value of the parameters and is defined as
where n_j is the number of parameters in the j -th layer.
{\boldsymbol{\phi}_{\bf{6}}} is variance, which is used to measure the degree of deviation between parameters and their expectation. {\boldsymbol{\phi}_{\bf{6}}} is computed by
{\boldsymbol{\phi}_{\bf{7}}} is skewness, which is used to measure the direction and degree of parameters distribution skew. {\boldsymbol{\phi}_{\bf{7}}} is computed by
Each feature is a vector with the dimension of the number of layers in the model. To find the optimal set of features, we use the 4 features to train subclassifiers. For each subclassifier, the average accuracy is obtained over ResNet34, VGG16, and EfficientNetB0 where \lambda_c = 0.1 with payloads of 0.2, 0.6, and 1.0 bits per parameter. Fig. 4 shows the average accuracy of each subclassifier. Using the settings mentioned in Ref. [14], gray-scale images are embedded as the secret message. Hence, for each parameter, 8 bits of a secret message are embedded. As shown in Fig .4, {\boldsymbol{\phi}_{\bf{6}}} and {\boldsymbol{\phi}_{\bf{8}}} give accuracies above 0.95, which are higher than those obtained with {\boldsymbol{\phi}_{\bf{5}}} and {\boldsymbol{\phi}_{\bf{7}}} . Thus, we choose {\boldsymbol{\phi}_{\bf{6}}} and {\boldsymbol{\phi}_{\bf{8}}} as the features for detecting the models. Therefore, the total feature vector of correlated value encoding steganalysis is \Phi_2 = \{{\boldsymbol{\phi}_{\bf{6}}}, {\boldsymbol{\phi}_{\bf{8}}} \} .
Figure
4.
Results for the detection of ResNet34, VGG16, and EfficientNetB0 trained on CIFAR10 using different statistics as features respectively. (a) results for correlated value encoding steganalysis; and (b) results for sign encoding steganalysis.
(Ⅲ) Steganalysis of sign encoding: Sign encoding can also be used to embed a secret message through the malicious correlation term during training; however, in this case, the message is embedded in the signs of parameters. Theoretically, by the malicious correlation term, the parameter and the secret message have the same sign. However, we observe that in practice, not all sign constraints are met. The malicious term penalizes the mismatched parameters by bringing them close to 0, which leads to a distribution of parameters that is different from that in the benign model. Therefore, the difference in model parameter distribution is used to detect the secret message in our steganalysis method.
As sign encoding also disrupts the parameters distribution, we select features using an approach that is similar to correlated value encoding steganalysis. To detect the bias caused by sign encoding, we also design our steganalysis features by referring to the first-order moments {\boldsymbol{\phi}_{\bf{5}}} (expectation), second-order moments {\boldsymbol{\phi}_{\bf{6}}} (variance), third-order moments {\boldsymbol{\phi}_{\bf{7}}} (skewness) and fourth-order moments {\boldsymbol{\phi}_{\bf{8}}} (kurtosis). Similarly, the distribution is described for each layer separately.
Similar to the experiment for correlated value encoding steganalysis, for each subclassifier, the average accuracy is obtained over ResNet34, VGG16 and EfficientNetB0 where \lambda_s = 50 with payloads of 0.2, 0.6, and 1.0 bits per parameter. Fig. 4 shows the average accuracy of each subclassifier. As shown in Fig. 4, {\boldsymbol{\phi}_{\bf{5}}} and {\boldsymbol{\phi}_{\bf{7}}} provide accuracies above 0.95, which are higher than those achieved with {\boldsymbol{\phi}_{\bf{6}}} and {\boldsymbol{\phi}_{\bf{8}}} . Thus, we choose {\boldsymbol{\phi}_{\bf{5}}} and {\boldsymbol{\phi}_{\bf{7}}} as the features for detecting models. Therefore, the total feature vector of sign encoding steganalysis is \Phi_3 = \{{\boldsymbol{\phi}_{\bf{5}}}, {\boldsymbol{\phi}_{\bf{7}}} \} .
4.
Experiments
In this section, we introduce the setup of our experiments. Then, we discuss the results of the experiments, including the detection achieved with three steganography methods with different embedding rates.
4.1
Experimental setup
Table 1 summarizes the datasets and networks we used in our experiments.
Table
1.
Models and datasets used in our experiments.
The image datasets used in our experiments are the well-known CIFAR10[27] and Tiny-ImageNet[32]. ResNet34[8], VGG16[6], EfficientNetB0[9], and MobileNet[33] are selected as the models to be detected. Table 1 shows the number of parameters for each model and the accuracy of the benign model. Our implementation and its corresponding initial architectures are based on PyTorch. In all our experiments, we set the mini-batch size to 128, the initial learning rate to 0.1, and the number of epochs for training to 100. For networks trained on CIFAR-10, we decrease the learning rate by a factor of 0.1 for better convergence in epoch 60. For models trained on Tiny-ImageNet, we decrease the learning rate by a factor of 0.1 in epochs 40 and 60. In each experiment, models are validated and saved every one epoch. The model with the best validation accuracy is selected as the final model.
To measure the impact of secret message length on detection, we design detecting tasks with different payloads for all the steganography methods. For LSB encoding, the payloads are set to 0.05, 0.4, 0.8, and 1.0 bits per parameter. For correlated value encoding and sign encoding, as not all the secret bits can be embedded successfully, we set the payloads to 0.2, 0.6, and 1.0 bits per parameter. For all the steganography methods, we train 100 benign models and 100 malicious models. When the payload is not 1.0, the malicious models are embedded randomly.
For all the steganography methods with a payload of 1.0, we train 100 benign models and 100 malicious models. For LSB encoding with payloads of 0.05, 0.4, and 0.8, we train 100 benign models and 300 malicious models, which are uniformly composed of three embedding methods: sequential embedding, sequential embedding from a random position and random embedding. For Correlated Value Encoding and Sign Encoding with payloads of 0.2 and 0.6, we train 100 benign models and 100 malicious models, which are embedded randomly.
For the detection of LSB encoding, correlated value encoding and sign encoding, we evaluate the performance of our detecting methods by 5-fold cross validation. For each cross validation, 80 \% of the models are selected to form the training set and 20 \% of the models are selected to form the testing set.
We use logistic regression[24-26] as our classifier. In all our experiments, we set the number of epochs for training to 100, and use liblinear for optimizing the loss function. We also use the l_2 -norm as the regularizer with the coefficient \epsilon set to 1.0 . For the detection of LSB encoding, correlated value encoding, and sign encoding, we measure the average detecting accuracy of the 5-fold cross validation. The average detecting accuracy is measured as
where TP is the true positive, TN is the true negative, FP is the false positive, FN is the false negative, and i is the i -th cross validation.
4.2
Detection performance of LSB encoding
We train the neural network from scratch on the CIFAR-10 and Tiny-ImageNet with different model structures and payloads, and record the results achieved. Table 2 shows the lowest bit plane embedded without significantly reduced accuracy. For all models, embedding the 14th bit plane does not lead to a significant drop in model performance. In our experiments, we detect the bit planes from the 14th to the 18th bit plane. Table 3 shows the performance of LSB encoding steganalysis for the models trained on CIFAR10 and Tiny-ImageNet.
Table
2.
Results of the LSB encoding in our models. b is the lowest bit plane embedded without significantly reduced accuracy.
Table
3.
Results of the LSB encoding steganalysis for models trained on CIFAR10 and Tiny-ImageNet at payloads of 1, 0.8, 0.4, and 0.05 on Bit planes from the 14th to 18th bit plane.
The results in Table 3 reveal that, except for a few cases, our approach can effectively detect all models, from the 14th to 18th bit plane. For all model architectures at the 14th bit plane with payloads of 0.4, 0.8 and 1.0, our method can achieve 100\% detection accuracies. However, with an increase in the bit planes and a decrease in payload, the performance of our method decreases. For example, for EfficientNetB0 trained on CIFAR10, while the payload is 1.0, our method fails to detect the 18th bit plane as the accuracy is 62.75\% , and while the payload decreases to 0.05, even for the 14th bit plane, the accuracy is 77.72\% . For models trained on CIFAR10, our method works better on ResNet34 and VGG16 than EfficientNetB0. For models trained on Tiny-ImageNet, our method works better on VGG16 than MobileNet. This result can be attributed to the number of parameters, which indicates that for a specified bit plane, sequences formed by benign models with fewer parameters have greater randomness, increasing the difficulty of detection. The results in Table 3 indicate that in most cases, the proposed method is an effective way for detecting LSB encoding.
4.3
Detection performance of correlated value encoding
To follow the setup mentioned in Ref. [14], we use gray-scale images as the secret message to be embedded. We train the neural network on the CIFAR10 and Tiny-ImageNet with different model structures, coefficient \lambda_c and payloads. Table 4 shows the appropriate coefficient \lambda_c and model accuracy. We use the mean absolute error (MAE) to measure the embedding degree. Given the embedded parameters {\boldsymbol{\theta}} and the secret {\boldsymbol{m}} , MAE is \dfrac{1}{l}\displaystyle \sum\limits_{i = 1}^l{ \lvert \theta_i-m_i \rvert }, where l is the length of the secret message {\boldsymbol{m}} . The range of MAE is [0, 255] , where 0 means the decoded and secret messages are identical and Table 5 shows the performance of our steganalysis method for models trained on different databases under payloads of 0.2, 0.6, and 1.0.
Table
4.
Results of the correlated value encoding in our models. \lambda_c is a suitable coefficient for the correlation term. The mean absolute error (MAE) is used to measure the embedding degree. Test ACC indicates the performance of malicious models trained with different \lambda_c .
Table
5.
Results of the correlated value encoding steganalysis for models trained on CIFAR10 and Tiny-ImageNet. For all models, the payloads are set to 1.0, 0.6, and 0.2, separately.
From Table 5, it can be seen that our approach can effectively detect all the models with different \lambda_c and payloads. For all the models with a payload of 1.0, our method can achieve 100.00\% detection accuracies. With a decrease in payload, except for a few cases, the performance of our detection decreases. For ResNet34 trained on CIFAR10 with \lambda_c = 0.05 , the accuracy of detection with a payload of 0.6 is 98.20\% , and for a payload drops of 0.2, the accuracy is 95.75\% . However, for VGG16 trained on Tiny-ImageNet with \lambda_c = 0.05 , the accuracy of detection with a payload of 1.0 is 93.80\% , which is lower than that achieved with a payload of 0.2. However, even at a low embedding payload of 0.2, our approach is still valid.
4.4
Detection performance of sign encoding
We train the neural network from scratch on the CIFAR10 dataset and Tiny-ImageNet with different model structures, \lambda_s, and payloads. Table 6 shows the appropriate coefficient for the correlation term. Given the embedded parameters {\boldsymbol{\theta}} and binary secret {\boldsymbol{m}} , the embed degree is measured by \dfrac{1}{l}\displaystyle \sum\nolimits_{i = 1}^l {1\left\{ {{\rm{sign}}({\theta _i}) \ne {m_i}} \right\}}, where 1\{ \cdot \} is the indicator function, 1\{ a true statement \} = 1 and 1\{ a false statement \} = 0 , l is the length of the secret message {\boldsymbol{m}}, and {\rm{sign}}(\cdot) is the sign function. Table 7 shows the performance of the neural network under different payloads, which are set to 0.2, 0.6, and 1.0.
Table
6.
Results of the sign encoding steganalysis for models trained on CIFAR10 and Tiny-ImageNet. Test ACC indicates the performance of malicious models trained with \lambda_s = 50.0 and 10.0 .
The results in Table 7 reveal that our approach can effectively detect all the models with different \lambda_s at payloads at 1.0, 0.6, and 0.2. For VGG16 trained on Tiny-ImageNet with \lambda_s = 50.0 and 10.0 at payloads of 0.2, 0.6, and 1.0, our method achieves a 100.00\% detection accuracy. Furthermore, the decrease in payload has no significant effect on the detection accuracy. For example, for Efficient NetB0 trained on CIFAR10 with \lambda_s = 10.0 at a payload of 1.0, the accuracy is 92.05\% , which is lower than those at payloads of 0.6 and 0.2. However, even when the payload is low, our detection method is still valid.
4.5
Detection performance of the overall framework
As the steganography method and the payload used by the malicious developer for an injected model are both unknown, we need to use all three detection methods and then fuse their results. For simplicity purposes, we specify the payloads of the model parameters, detect models by the three steganalysis methods separately, and fuse the decisions.
Neural networks are trained on the CIFAR10 and Tiny-ImageNet with different model structures. For each detection method, 60 benign and 60 malicious models are selected to train the classifier. A total of 40 benign and 120 malicious models uniformly composed of the three steganography methods are used for validation. For LSB, the secret message is embedded in the 14th bit plane, and the payload is set to 0.05. For COR, the payload is set to 0.2, and the coefficient \lambda_c is set to the maximum appropriate value. For example, for ResNet34, the max appropriate coefficient \lambda_c is 0.1, and for EfficientNet, the maximum appropriate coefficient \lambda_c is 0.03. For SGN, the payload is 0.2, and the coefficient \lambda_s is 50.0. The missing alarm rate and the false alarm rate are used to evaluate the effectiveness of detection, and are defined as
P_{\rm{{MA}}} = \dfrac{{\rm{FN}}}{{\rm{TP+FN}}}
(20)
{\rm{P_{FA} = \dfrac{FP}{TP+FP}}}
(21)
Table 8 shows the detecting results of our overall framework. LSB detection, COR detection, and SGN detection mean detecting all models by the specific steganalysis method, and the framework detection means detecting models by our framework. It can be seen that our overall framework can effectively detect injection models. The missing alarm rate for framework detection is lower than that for specific detection such as LSB detection. Compared to LSB detection, framework detection has a lower false alarm rate as there is an increased number of true positives. However, for COR detection and SGN detection, the false alarm rate is lower than that for framework detection owing to the voting rule of our framework.
Table
8.
Detection results of our overall framework. LSB detection means detecting models by the LSB encoding steganalysis. The definitions of COR detection and SGN detection are similar to that of LSB detection. Framework detection means detecting models by the overall framework.
To validate the detecting performance of our framework with unknown payloads, classifers in our framework trained on models with lower payloads are used to detect the models with higher payloads. As in the framework detection experiment setting, in this case, neural networks are also trained on the CIFAR10 and Tiny-ImageNet with different model structures. For LSB, secret message is embedded in the 14th bit plane. For COR, the coefficient \lambda_c is set to the maximum appropriate value. For SGN, the coefficient \lambda_s is 50.0. In our framework, the classifiers for LSB detection are trained with a payload of 0.05, and the classifiers for COR detection and SGN detection are trained with a payload of 0.2. Then, we validate our framework on models embedded by LSB with payloads of 1.0, 0.8, and 0.4, embedded by COR with payloads of 1.0 and 0.6, and embedded by SGN with payloads of 1.0 and 0.6. For each embedding method under each payload, 40 benign models and 40 malicious models are used to detect the secret message.
Table 9 shows the detection results obtained using the framework trained with models with lower payloads to detect models with higher payloads. It can be seen that for most of the higher payloads models, our framework, which is trained on the lower payloads models, can effectively detect them. For ResNet34 trained on CIFAR10 embedded by SGN at the payload of 0.6, the accuracy of our framework is 97.50\% . However, for EfficientNet trained on CIFAR10 embedded by COR at the payload of 0.6, the accuracy is only 68.75\% . The results inTables 8 and 9 reveal that, in most cases, our overall framework trained on lower payloads models can effectively detect models with higher payloads.
Table
9.
Results of using frameworks trained with models with a lower payload to detect models with higher payloads. In the framework, the classifiers of LSB detection are trained with a payload of 0.05, and the classifers of COR and SGN detections are trained with a payload of 0.2.
In this paper, we propose steganalysis methods to detect the steganography on neural networks. First, we analyze the statistical bias caused by these steganography methods. As there are multiple features that can be used to describe each statistical bias, in order to find an optimal set of features, we compare the detection accuracies of the methods. Finally, we use the optimal set of features for classification. Various experiments are conducted to show the effectiveness of our framework in detecting neural network steganography.
The results in Table 3 reveal that, for LSB Encoding Steganalysis, our method fails to detect the bit planes higher than the 18th bit plane. Methods for detecting higher bit planes need to be explored.
Acknowledgements
This work is supported in part by the Natural Science Foundation of China (62102386, 62002334,62072421, 62121002), Fundamental Research Funds for the Central Universities (WK2100000018,WK2100000011), Exploration Fund Project of University of Science and Technology of China (YD3480002001), and Open Fund of Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation.
Conflict of Interest
The authors declare that they have no conflict of interest.
The spray cold plate can meet the heat dissipation needs of compact and multiple heat sources.
The strengthening effect of additives under the condition of spray cold plate is greatly weakened, but there are still some concentrations of additives that have positive effects. For example, adding 200 ppm of SDS can increase the heat transfer coefficient of the spray cold plate by 19.8%.
The mechanism of additive strengthening was studied and the heat transfer correlation formula of additive acting on spray cold plate was obtained. The maximum errors were 2.1%, 2.8% and 5.4% respectively.
Xu X, Wang Y, Bang Y, et al. Recent advances in closed loop spray cooling and its application in airborne systems. Journal of Thermal Science,2021, 30 (1): 32–50. DOI: 10.1007/s11630-020-1395-y
[3]
Fucheng H, Haihong D, Fan M. Research on simulation of heat transfer characteristics of intermittent spray cooling. IOP Conference Series: Earth and Environmental Science,2021, 647: 012060. DOI: 10.1088/1755-1315/647/1/012060
[4]
Zhou Z F, Chen B, Wang R, et al. Comparative investigation on the spray characteristics and heat transfer dynamics of pulsed spray cooling with volatile cryogens. Experimental Thermal and Fluid Science,2017, 82: 189–197. DOI: 10.1016/j.expthermflusci.2016.11.016
[5]
Yata V V R, Bostanci H. Investigation of spray cooling schemes for dynamic thermal management. Proceedings of the 16th Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems Itherm. Orlando, USA: IEEE, 2017: 744-751. https://ieeexploreieee.53yu.com/abstract/document/7992560/
[6]
Alkhedhair A, Jahn I, Gurgenci H, et al. Parametric study on spray cooling system for optimising nozzle design with pre-cooling application in natural draft dry cooling towers. International Journal of Thermal Sciences,2016, 104: 448–460. DOI: 10.1016/j.ijthermalsci.2016.02.004
[7]
Jiang L J, Jiang S L, Cheng W L, et al. Experimental study on heat transfer performance of a novel compact spray cooling module. Applied Thermal Engineering,2019, 154: 150–156. DOI: 10.1016/j.applthermaleng.2019.03.078
[8]
Wang Y, Zhou N, Yang Z, et al. Experimental investigation of aircraft spray cooling system with different heating surfaces and different additives. Applied Thermal Engineering,2016, 103: 510–521. DOI: 10.1016/j.applthermaleng.2016.04.124
[9]
Das L, Munshi B, Mohapatra S S. The enhancement of spray cooling performance in nucleate and transition boiling regimes by using saline water containing dissolved carbon dioxide. Journal of Thermal Science and Engineering Applications,2019, 12 (2): 4044170. DOI: 10.1115/1.4044170
[10]
Pati A R, Mohapatra S S. The effect of oxide layer in case of novel coolant spray at very high initial surface temperature. Experimental Heat Transfer,2019, 32 (2): 116–132. DOI: 10.1080/08916152.2018.1485784
[11]
Khoshvaght-Aliabadi M, Deldar S, Hassani S M. Effects of pin-fins geometry and nanofluid on the performance of a pin-fin miniature heat sink (PFMHS). International Journal of Mechanical Sciences,2018, 148: 442–458. DOI: 10.1016/j.ijmecsci.2018.09.019
[12]
Hassani S M, Khoshvaght-Aliabadi M, Mazloumi S H. Influence of chevron fin interruption on thermo-fluidic transport characteristics of nanofluid-cooled electronic heat sink. Chemical Engineering Science,2018, 191: 436–447. DOI: 10.1016/j.ces.2018.07.010
[13]
Khoshvaght-Aliabadi M, Hassani S M, Mazloumi S H, et al. Effects of nooks configuration on hydrothermal performance of zigzag channels for nanofluid-cooled microelectronic heat sink. Microelectronics Reliability,2017, 79: 153–165. DOI: 10.1016/j.microrel.2017.10.024
[14]
Cheng W L, Zhang W W, Jiang L J, et al. Experimental investigation of large area spray cooling with compact chamber in the non-boiling regime. Applied Thermal Engineering,2015, 80: 160–167. DOI: 10.1016/j.applthermaleng.2015.01.055
[15]
Cheng W, Xie B, Han F, et al. An experimental investigation of heat transfer enhancement by addition of high-alcohol surfactant (HAS) and dissolving salt additive (DSA) in spray cooling. Experimental Thermal and Fluid Science,2013, 45: 198–202. DOI: 10.1016/j.expthermflusci.2012.11.005
[16]
Chen H, Cheng W L, Peng Y H, et al. Dynamic Leidenfrost temperature increase of impacting droplets containing high-alcohol surfactant. International Journal of Heat and Mass Transfer,2018, 118: 1160–1168. DOI: 10.1016/j.ijheatmasstransfer.2017.11.100
[17]
Zhang W W, Li Y Y, Long W J, et al. Enhancement mechanism of high alcohol surfactant on spray cooling: Experimental study. International Journal of Heat and Mass Transfer,2018, 126: 363–376. DOI: 10.1016/j.ijheatmasstransfer.2018.05.130
[18]
Li Y Y, Zhao R, Long W J, et al. Theoretical study of heat transfer enhancement mechanism of high alcohol surfactant in spray cooling. International Journal of Thermal Sciences,2021, 163: 106816. DOI: 10.1016/j.ijthermalsci.2020.106816
Xu X, Wang Y, Bang Y, et al. Recent advances in closed loop spray cooling and its application in airborne systems. Journal of Thermal Science,2021, 30 (1): 32–50. DOI: 10.1007/s11630-020-1395-y
[3]
Fucheng H, Haihong D, Fan M. Research on simulation of heat transfer characteristics of intermittent spray cooling. IOP Conference Series: Earth and Environmental Science,2021, 647: 012060. DOI: 10.1088/1755-1315/647/1/012060
[4]
Zhou Z F, Chen B, Wang R, et al. Comparative investigation on the spray characteristics and heat transfer dynamics of pulsed spray cooling with volatile cryogens. Experimental Thermal and Fluid Science,2017, 82: 189–197. DOI: 10.1016/j.expthermflusci.2016.11.016
[5]
Yata V V R, Bostanci H. Investigation of spray cooling schemes for dynamic thermal management. Proceedings of the 16th Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems Itherm. Orlando, USA: IEEE, 2017: 744-751. https://ieeexploreieee.53yu.com/abstract/document/7992560/
[6]
Alkhedhair A, Jahn I, Gurgenci H, et al. Parametric study on spray cooling system for optimising nozzle design with pre-cooling application in natural draft dry cooling towers. International Journal of Thermal Sciences,2016, 104: 448–460. DOI: 10.1016/j.ijthermalsci.2016.02.004
[7]
Jiang L J, Jiang S L, Cheng W L, et al. Experimental study on heat transfer performance of a novel compact spray cooling module. Applied Thermal Engineering,2019, 154: 150–156. DOI: 10.1016/j.applthermaleng.2019.03.078
[8]
Wang Y, Zhou N, Yang Z, et al. Experimental investigation of aircraft spray cooling system with different heating surfaces and different additives. Applied Thermal Engineering,2016, 103: 510–521. DOI: 10.1016/j.applthermaleng.2016.04.124
[9]
Das L, Munshi B, Mohapatra S S. The enhancement of spray cooling performance in nucleate and transition boiling regimes by using saline water containing dissolved carbon dioxide. Journal of Thermal Science and Engineering Applications,2019, 12 (2): 4044170. DOI: 10.1115/1.4044170
[10]
Pati A R, Mohapatra S S. The effect of oxide layer in case of novel coolant spray at very high initial surface temperature. Experimental Heat Transfer,2019, 32 (2): 116–132. DOI: 10.1080/08916152.2018.1485784
[11]
Khoshvaght-Aliabadi M, Deldar S, Hassani S M. Effects of pin-fins geometry and nanofluid on the performance of a pin-fin miniature heat sink (PFMHS). International Journal of Mechanical Sciences,2018, 148: 442–458. DOI: 10.1016/j.ijmecsci.2018.09.019
[12]
Hassani S M, Khoshvaght-Aliabadi M, Mazloumi S H. Influence of chevron fin interruption on thermo-fluidic transport characteristics of nanofluid-cooled electronic heat sink. Chemical Engineering Science,2018, 191: 436–447. DOI: 10.1016/j.ces.2018.07.010
[13]
Khoshvaght-Aliabadi M, Hassani S M, Mazloumi S H, et al. Effects of nooks configuration on hydrothermal performance of zigzag channels for nanofluid-cooled microelectronic heat sink. Microelectronics Reliability,2017, 79: 153–165. DOI: 10.1016/j.microrel.2017.10.024
[14]
Cheng W L, Zhang W W, Jiang L J, et al. Experimental investigation of large area spray cooling with compact chamber in the non-boiling regime. Applied Thermal Engineering,2015, 80: 160–167. DOI: 10.1016/j.applthermaleng.2015.01.055
[15]
Cheng W, Xie B, Han F, et al. An experimental investigation of heat transfer enhancement by addition of high-alcohol surfactant (HAS) and dissolving salt additive (DSA) in spray cooling. Experimental Thermal and Fluid Science,2013, 45: 198–202. DOI: 10.1016/j.expthermflusci.2012.11.005
[16]
Chen H, Cheng W L, Peng Y H, et al. Dynamic Leidenfrost temperature increase of impacting droplets containing high-alcohol surfactant. International Journal of Heat and Mass Transfer,2018, 118: 1160–1168. DOI: 10.1016/j.ijheatmasstransfer.2017.11.100
[17]
Zhang W W, Li Y Y, Long W J, et al. Enhancement mechanism of high alcohol surfactant on spray cooling: Experimental study. International Journal of Heat and Mass Transfer,2018, 126: 363–376. DOI: 10.1016/j.ijheatmasstransfer.2018.05.130
[18]
Li Y Y, Zhao R, Long W J, et al. Theoretical study of heat transfer enhancement mechanism of high alcohol surfactant in spray cooling. International Journal of Thermal Sciences,2021, 163: 106816. DOI: 10.1016/j.ijthermalsci.2020.106816
Table
3.
Results of the LSB encoding steganalysis for models trained on CIFAR10 and Tiny-ImageNet at payloads of 1, 0.8, 0.4, and 0.05 on Bit planes from the 14th to 18th bit plane.
Table
4.
Results of the correlated value encoding in our models. \lambda_c is a suitable coefficient for the correlation term. The mean absolute error (MAE) is used to measure the embedding degree. Test ACC indicates the performance of malicious models trained with different \lambda_c .
Table
5.
Results of the correlated value encoding steganalysis for models trained on CIFAR10 and Tiny-ImageNet. For all models, the payloads are set to 1.0, 0.6, and 0.2, separately.
Table
6.
Results of the sign encoding steganalysis for models trained on CIFAR10 and Tiny-ImageNet. Test ACC indicates the performance of malicious models trained with \lambda_s = 50.0 and 10.0 .
Table
8.
Detection results of our overall framework. LSB detection means detecting models by the LSB encoding steganalysis. The definitions of COR detection and SGN detection are similar to that of LSB detection. Framework detection means detecting models by the overall framework.
Table
9.
Results of using frameworks trained with models with a lower payload to detect models with higher payloads. In the framework, the classifiers of LSB detection are trained with a payload of 0.05, and the classifers of COR and SGN detections are trained with a payload of 0.2.