基于机器学习算法的时间序列价格异常检测（附(8)_手机掌酷门户-http://wapzk.net(已创建9年零8个月)官网

基于机器学习算法的时间序列价格异常检测（附(8)

2019-02-08 17:33 量化投资与机器学习

# choosing a sliding windows size (size of sequence to evaluate) and a threshold

defmarkovAnomaly(df, windows_size, threshold):

transition_matrix = getTransitionMatrix(df)

real_threshold = threshold**windows_size

df_anomaly = []

forj inrange(0, len(df)):

if(j < windows_size

df_anomaly.append(0)

else:

sequence = df[j-windows_size:j]

sequence = sequence.reset_index(drop=True)

df_anomaly.append(anomalyElement(sequence, real_threshold, transition_matrix))

returndf_anomaly

df['anomaly24'] = df_anomaly

fig, ax = plt.subplots(figsize=(10, 6))

a = df.loc[df['anomaly24'] == 1, ('date_time_int', 'price_usd')] #anomaly

ax.plot(df['date_time_int'], df['price_usd'], color='blue')

ax.scatter(a['date_time_int'],a['price_usd'], color='red')

plt.show();

a = df.loc[df['anomaly24'] == 0, 'price_usd']

b = df.loc[df['anomaly24'] == 1, 'price_usd']

fig, axs = plt.subplots(figsize=(16,6))

axs.hist([a,b], bins=32, stacked=True, color=['blue', 'red'])

plt.show();

因为我们的异常检测是无监督学习。在构建模型之后，我们不知道它做得有多好，因为我们没有测试它的依据。因此，在将这些方法置于关键路径之前，需要对这些方法的结果进行实地测试。

总结

到目前为止，我们已经用五种不同的方法进行了价格异常检测。因为我们的异常检测是无监督学习，在构建模型之后，由于我们没有任何东西可以对它进行测试，我们也没有办法知道这些方法的有效性。因此，在将这些方法应用于重要场合之前，须务必对其进行现场数据的测试。

参考文献：

1、https://www.datascience.com/blog/python-anomaly-detection

2、https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html

3、https://scikit-learn.org/stable/modules/generated/sklearn.svm.OneClassSVM.html

4、https://scikit-learn.org/stable/modules/generated/sklearn.covariance.EllipticEnvelope.html

5、https://www.kaggle.com/victorambonati/unsupervised-anomaly-detection

如何获取代码

20190208

共8页:

标签：数据检测算法异常

官方微信公众号：掌酷门户(wapzknet)