将趋势线添加到熊猫
machine-learning
pandas
python
6
0

我有时间序列数据,如下所示:

                  emplvl
date                    
2003-01-01  10955.000000
2003-04-01  11090.333333
2003-07-01  11157.000000
2003-10-01  11335.666667
2004-01-01  11045.000000
2004-04-01  11175.666667
2004-07-01  11135.666667
2004-10-01  11480.333333
2005-01-01  11441.000000
2005-04-01  11531.000000
2005-07-01  11320.000000
2005-10-01  11516.666667
2006-01-01  11291.000000
2006-04-01  11223.000000
2006-07-01  11230.000000
2006-10-01  11293.000000
2007-01-01  11126.666667
2007-04-01  11383.666667
2007-07-01  11535.666667
2007-10-01  11567.333333
2008-01-01  11226.666667
2008-04-01  11342.000000
2008-07-01  11201.666667
2008-10-01  11321.000000
2009-01-01  11082.333333
2009-04-01  11099.000000
2009-07-01  10905.666667

时间序列图

我想以最简单的方式在此图中添加线性趋势(带有截距)。另外,我只想以2006年之前的数据为条件来计算这种趋势。

我在这里找到了一些答案,但它们都包含statsmodels 。首先,这些答案可能不是最新的: pandas得到了改进,现在它本身包含了OLS组件。其次, statsmodels似乎估计每个时间段的单个固定效应,而不是线性趋势。我想我可以重新计算一个运行季度的变量,但是最可行的方法是这样做吗?

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 emplvl   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                    nan
Method:                 Least Squares   F-statistic:                     0.000
Date:                tor, 14 apr 2016   Prob (F-statistic):                nan
Time:                        17:17:43   Log-Likelihood:                 929.85
No. Observations:                  40   AIC:                            -1780.
Df Residuals:                       0   BIC:                            -1712.
Df Model:                          39                                         
Covariance Type:            nonrobust                                         
============================================================================================================
                                               coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------------------------------------
Intercept                                 1.095e+04        inf          0        nan           nan       nan
date[T.Timestamp('2003-04-01 00:00:00')]   135.3333        inf          0        nan           nan       nan
date[T.Timestamp('2003-07-01 00:00:00')]   202.0000        inf          0        nan           nan       nan
date[T.Timestamp('2003-10-01 00:00:00')]   380.6667        inf          0        nan           nan       nan
date[T.Timestamp('2004-01-01 00:00:00')]    90.0000        inf          0        nan           nan       nan
date[T.Timestamp('2004-04-01 00:00:00')]   220.6667        inf          0        nan           nan       nan

如何以最简单的方式估算此趋势并将预测值作为列添加到我的数据框中?

参考资料:
Stack Overflow
收藏
评论
共 2 个回答
高赞 时间 活跃

通常,您应该提前创建matplotlib图形和轴对象,并在其上明确绘制数据框:

from matplotlib import pyplot
import pandas
import statsmodels.api as sm

df = pandas.read_csv(...)

fig, ax = pyplot.subplots()
df.plot(x='xcol', y='ycol', ax=ax)

然后,仍然有该轴对象可直接用于绘制线条:

model = sm.formula.ols(formula='ycol ~ xcol', data=df)
res = model.fit()
df.assign(fit=res.fittedvalues).plot(x='xcol', y='fit', ax=ax)
收藏
评论

这是有关如何使用pandas.ols进行此操作的简单示例:

import matplotlib.pyplot as plt
import pandas as pd

x = pd.Series(np.arange(50))
y = pd.Series(10 + (2 * x + np.random.randint(-5, + 5, 50)))
regression = pd.ols(y=y, x=x)
regression.summary

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x> + <intercept>

Number of Observations:         50
Number of Degrees of Freedom:   2

R-squared:         0.9913
Adj R-squared:     0.9911

Rmse:              2.7625

F-stat (1, 48):  5465.1446, p-value:     0.0000

Degrees of Freedom: model 1, resid 48

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
             x     2.0013     0.0271      73.93     0.0000     1.9483     2.0544
     intercept     9.5271     0.7698      12.38     0.0000     8.0183    11.0358
---------------------------------End of Summary---------------------------------

trend = regression.predict(beta=regression.beta, x=x[20:]) # slicing to only use last 30 points
data = pd.DataFrame(index=x, data={'y': y, 'trend': trend})
data.plot() # add kwargs for title and other layout/design aspects
plt.show() # or plt.gcf().savefig(path)

在此处输入图片说明

收藏
评论
新手导航
  • 社区规范
  • 提出问题
  • 进行投票
  • 个人资料
  • 优化问题
  • 回答问题

关于我们

常见问题

内容许可

联系我们

@2020 AskGo
京ICP备20001863号