TUNIVERSE

Lifelong Disk Failure Prediction via GAN-based Anomaly Detection

字数统计: 313阅读时长: 1 min
2023/01/19

使用GAN

Many supervised machine learning algorithms heavily rely on the availability of substantial annotated failed disk data which unfortunately exhibits an extreme data imbalance, resulting in suboptimal performance and even inability at the beginning of their deployment, i.e., cold starting problem.

model

Untitled

1. data processing

① Feature Selection:

Untitled

② Construction of 2D-SMART attributes:

Untitled

③ normalize

Untitled

2. GAN-based method

① GAN-based anomaly detection

Untitled

fine-tuning technique to do model updating

  • what:problem of model aging, the prior trained model will lose validity on the new coming SMART data
  • how: transfer information from one dataset to another one.
    • accumulation updating strategy: reuses the old and retrains it on new coming data
    • other strategy: discards the old model and trains a brand-new model using new coming data
  • challenging: sample labeling ⇒ automatic online labeling method proposed by Xiao.
    • use first-in-first-out queue
    • semi-supervised: only use healthy samples
    • relaxes the updating frequency: use batches of samples ( constant time inrerval, dataset S, full, into 2D-SMART attributes chunks )

experiments

1. dataset

  • from Backblaze, span a period of 12 months ranging from January 2017 to December 2017

Untitled

  • divide disks in each dataset randomly into training set and test set in the proportion of 7:3

2. algorithms

  • iterations: 1000, lr: 0.01, optimization: Adam
    1. RF ( state-of-the-art ): 150 trees
    2. SVM: LIBSVM ⇒ polynomial, sigmal, linear
    3. BP: 3 layers with 64 nodes in th ehidden layer and relu
    4. SPA ( size of z: 100, T: 12 which means the input shape of image-like representation )
  • comparision

Untitled

Untitled

3. preponderance

① avoid the cold starting problem

  • use 2D Image-Like Representation
  • T = 12

Untitled

Untitled

② alleviate model aging problem

  • use fine-tuning

Untitled

conclusion

Compared with the state-of-the-art supervised machine learning based methods, our approach predicts disk failures at a higher accuracy for the entire lifetime of models, i.e., both the initial period and the long-term usage.

CATALOG
  1. 1. model
    1. 1.1. 1. data processing
    2. 1.2. 2. GAN-based method
  2. 2. experiments
    1. 2.1. 1. dataset
    2. 2.2. 2. algorithms
    3. 2.3. 3. preponderance
  3. 3. conclusion