与R类似的Python线性回归诊断图

4 人关注

我想在Python中获得线性回归的诊断图，我想知道是否有一个快速的方法可以做到这一点。

在R中，你可以使用下面的代码片段，它会给你一个残差与拟合图、正常的Q-Q图、标度-位置、残差与杠杆图。

m1 <- lm(cost~ distance, data = df1)
summary(m1)
plot(m1)
有没有一种快速的方法可以在python中做到这一点？
有一篇很好的博文描述了如何使用Python代码来获得与R会给你的相同的图，但它需要相当多的代码（至少与R的方法相比）。链接。https://underthecurve.github.io/jekyll/update/2016/07/01/one-regression-six-ways.html#Python


         1
         
         个评论


           
            Kostas Mouratidis
           
           ：


           
            你可以创建一个函数/模块，然后导入它并使用像
            
             my_plot(formula, data)
            
            这样的单行字。这也是R在引擎盖下所做的事情。一些R代码可能（不确定，抱歉）是
            
             plot
            
            的来源。
            
             github.com/SurajGupta/r-source/blob/master/src/library/stats/R/...


         python


         plot


         regression


         linear-regression


        1
        
        个回答


          已采纳


         0
         
         人赞同


          
           我喜欢把所有东西都储存在
           
            pandas
           
           中，并尽可能地用
           
            DataFrame.plot()
           
           来绘制。
          
          from matplotlib import pyplot as plt
from pandas.core.frame import DataFrame
import scipy.stats as stats
import statsmodels.api as sm
def linear_regression(df: DataFrame) -> DataFrame:
    """Perform a univariate regression and store results in a new data frame.
    Args:
        df (DataFrame): orginal data set with x and y.
    Returns:
        DataFrame: another dataframe with raw data and results.
    mod = sm.OLS(endog=df['y'], exog=df['x']).fit()
    influence = mod.get_influence()
    res = df.copy()
    res['resid'] = mod.resid
    res['fittedvalues'] = mod.fittedvalues
    res['resid_std'] = mod.resid_pearson
    res['leverage'] = influence.hat_matrix_diag
    return res
def plot_diagnosis(df: DataFrame):
    fig, axes = plt.subplots(nrows=2, ncols=2)
    plt.style.use('seaborn')
    # Residual against fitted values.
    df.plot.scatter(
        x='fittedvalues', y='resid', ax=axes[0, 0]
    axes[0, 0].axhline(y=0, color='grey', linestyle='dashed')
    axes[0, 0].set_xlabel('Fitted Values')
    axes[0, 0].set_ylabel('Residuals')
    axes[0, 0].set_title('Residuals vs Fitted')
    # qqplot
    sm.qqplot(
        df['resid'], dist=stats.t, fit=True, line='45',
        ax=axes[0, 1], c='#4C72B0'
    axes[0, 1].set_title('Normal Q-Q')
    # The scale-location plot.
    df.plot.scatter(
        x='fittedvalues', y='resid_std', ax=axes[1, 0]
    axes[1, 0].axhline(y=0, color='grey', linestyle='dashed')
    axes[1, 0].set_xlabel('Fitted values')
    axes[1, 0].set_ylabel('Sqrt(|standardized residuals|)')
    axes[1, 0].set_title('Scale-Location')
    # Standardized residuals vs. leverage
    df.plot.scatter(
        x='leverage', y='resid_std', ax=axes[1, 1]
    axes[1, 1].axhline(y=0, color='grey', linestyle='dashed')
    axes[1, 1].set_xlabel('Leverage')
    axes[1, 1].set_ylabel('Sqrt(|standardized residuals|)')
    axes[1, 1].set_title('Residuals vs Leverage')
    plt.tight_layout()