Press "Enter" to skip to content

解密时间序列异常值：2/4

Published November 29, 2023 by 四海吧

昨天发生了什么？

在给大家分发咖啡后，莫雷利、扎帕和我回顾了昨天发生的事情：昨天：

解密时间序列异常值：2/4 四海第1张

解密时间序列中的异常值：1/4

罗维拉和叛逆数据

pub.towardsai.net

我们从与＃rovella相关的推文开始，这是一个包含大量异常值的时间序列，并且使用了两个基本信息来将其明确地标出：均值和标准差。

import pandas as pdimport numpy as nplink = 'https://raw.githubusercontent.com/ianni-phd/Datasets/main/rovella_tweets.csv'tweets = pd.read_csv(link, sep=';', decimal=',', index_col='date', parse_dates=['date'])tweets_series = tweets['target']

然后，我们开始无情地削减它们，就像用链锯一样。

Cutting-points work: 3 2 1… go! — Author — 削减点的工作：3 2 1…开始！—作者

# function DEFINITIONdef detect_outliers_zscore(ts, thres=3, points_not_to_touch=60, max_window=40, outliers_param=0.9):    '''    param ts                  : 包含日期时间索引的时间序列    param thres               : 阈值大于3时，使异常值检测更严格    param points_not_to_touch : 一开始不操纵的点数    param max_window          : 计算局部最大值的考虑窗口    param outliers_param      : [0, 1] 如果我想遵循异常值，则较低    '''    ts_reworked = ts.copy(deep=True)    outliers = []    dates = []    for i, d in zip(ts, ts.index):        ts_so_far = ts[ts.index <= d]        ts_so_far = ts_so_far.iloc[points_not_to_touch:]        ts_so_far = ts_so_far[~ts_so_far.index.isin(dates)]        length_so_far = ts_so_far.shape[0]        mean = np.mean(ts_so_far)        std = np.std(ts_so_far)        max_so_far = np.max(ts_so_far.iloc[:-max_window])                surplus = (outliers_param * (i - max_so_far))…

Published in 四海

Leave a Reply

Web Analytics