基于机器学习算法的时间序列价格异常检测（附(4)_手机掌酷门户-http://wapzk.net(已创建9年零8个月)官网

基于机器学习算法的时间序列价格异常检测（附(4)

2019-02-08 17:33 量化投资与机器学习

基于聚类算法的异常检测的基本假设是，如果我们对数据进行聚类划分，则正常数据将属于聚类，而异常数据将不属于任何聚类或属于小聚类。我们使用以下步骤来查找和可视化异常数据。

计算每个点与其最近的质心点之间的距离，最大的距离被认为是异常的。我们使用outliers_fraction为算法提供有关数据集中存在的异常值比例的信息，不同的数据集这个参数的设置也不尽相同。然而，我首先给出初始估计outliers_fraction = 0.01，因为在标准正态分布中它的百分比与均值的Z score距离的绝对值超过了3。使用outliers_fraction计算number_of_outliers。将threshold设置为这些异常值的最小距离。异常检测结果anomaly1包含了上述方法（0：正常，1：异常）。使用聚类视图可视化异常点。使用时间序列视图可视化异常点。

defgetDistanceByPoint(data, model):

distance = pd.Series()

fori inrange(0,len(data)):

Xa = np.array(data.loc[i])

Xb = model.cluster_centers_[model.labels_[i]-1]

distance.set_value(i, np.linalg.norm(Xa-Xb))

returndistance

outliers_fraction = 0.01

# get the distance between each point and its nearest centroid. The biggest distances are considered as anomaly

distance = getDistanceByPoint(data, kmeans[9])

number_of_outliers = int(outliers_fraction*len(distance))

threshold = distance.nlargest(number_of_outliers).min()

# anomaly1 contain the anomaly result of the above method Cluster (0:normal, 1:anomaly)

df['anomaly1'] = (distance >= threshold).astype(int)

# visualisation of anomaly with cluster view

fig, ax = plt.subplots(figsize=(10,6))

colors = {0:'blue', 1:'red'}

ax.scatter(df['principal_feature1'], df['principal_feature2'], c=df["anomaly1"].apply(lambdax: colors[x]))

plt.xlabel('principal feature1')

plt.ylabel('principal feature2')

plt.show();

df = df.sort_values('date_time')

df['date_time_int'] = df.date_time.astype(np.int64)

fig, ax = plt.subplots(figsize=(10,6))

a = df.loc[df['anomaly1'] == 1, ['date_time_int', 'price_usd']] #anomaly

ax.plot(df['date_time_int'], df['price_usd'], color='blue', label='Normal')

ax.scatter(a['date_time_int'],a['price_usd'], color='red', label='Anomaly')

plt.xlabel('Date Time Integer')

plt.ylabel('price in USD')

plt.legend()

plt.show();

看起来由k-means聚类算法获得的异常价格要么是非常高的费率要么是非常低的费率。

基于孤立森林算法的异常检测

共8页:
上一页
1
2
3
4
5
6
7
8
下一页

标签：数据检测算法异常

 0

上一篇：我为什么会买一款2300元的鼠标，这么贵的鼠标又
下一篇：没有了

官方微信公众号：掌酷门户(wapzknet)

首页 > 新闻中心 > 掌酷科技 > 数码 > 笔记本 >

相关资讯

基于机器学习算法的时间序列价格异常

我为什么会买一款2300元的鼠标，这么

艾媒报告：2018中国PC搜索市场专题报告

支持主板灯效同步的安钛克DP501中塔机

六大“未来式”存储器，谁将脱颖而出

不支持IPD调节透镜，Oculus软件代码疑似

小米手机数据误删，可以用这个方法找

AMD桌面、服务器和笔记本CPU齐开花：

新闻热点

精选美图

导航新闻科技手机美图女人娱乐时尚旅游生活

客户端合作免责友链
Copyright 2009-2019 冀ICP备09035849号-1
掌酷门户版权所有冀公网安备 13092302000152号