自学气象人 发表于 2024-3-15 23:25:10

50个常用统计图表代码总结(8)


36. Time Series with Peaks and Troughs Annotated
The below time series plots all the the peaks and troughs and annotates the occurence of selected special events.
# Import Data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv')

# Get the Peaks and Troughs
data = df['traffic'].values
doublediff = np.diff(np.sign(np.diff(data)))
peak_locations = np.where(doublediff ==-2)+1

doublediff2 = np.diff(np.sign(np.diff(-1*data)))
trough_locations = np.where(doublediff2 ==-2)+1

# Draw Plot
plt.figure(figsize=(16,10), dpi=80)
plt.plot('date','traffic', data=df, color='tab:blue', label='Air Traffic')
plt.scatter(df.date, df.traffic, marker=mpl.markers.CARETUPBASE, color='tab:green', s=100, label='Peaks')
plt.scatter(df.date, df.traffic, marker=mpl.markers.CARETDOWNBASE, color='tab:red', s=100, label='Troughs')

# Annotate
for t, p in zip(trough_locations, peak_locations[::3]):
    plt.text(df.date, df.traffic+15, df.date, horizontalalignment='center', color='darkgreen')
    plt.text(df.date, df.traffic-35, df.date, horizontalalignment='center', color='darkred')

# Decoration
plt.ylim(50,750)
xtick_location = df.index.tolist()[::6]
xtick_labels = df.date.tolist()[::6]
plt.xticks(ticks=xtick_location, labels=xtick_labels, rotation=90, fontsize=12, alpha=.7)
plt.title("Peak and Troughs of Air Passengers Traffic (1949 - 1969)", fontsize=22)
plt.yticks(fontsize=12, alpha=.7)

# Lighten borders
plt.gca().spines["top"].set_alpha(.0)
plt.gca().spines["bottom"].set_alpha(.3)
plt.gca().spines["right"].set_alpha(.0)
plt.gca().spines["left"].set_alpha(.3)

plt.legend(loc='upper left')
plt.grid(axis='y', alpha=.3)
plt.show()


37. Autocorrelation (ACF) and Partial Autocorrelation (PACF) Plot
The ACF plot shows the correlation of the time series with its own lags. Each vertical line (on the autocorrelation plot) represents the correlation between the series and its lag starting from lag 0. The blue shaded region in the plot is the significance level. Those lags that lie above the blue line are the significant lags.

So how to interpret this?

For AirPassengers, we see upto 14 lags have crossed the blue line and so are significant. This means, the Air Passengers traffic seen upto 14 years back has an influence on the traffic seen today.

PACF on the other had shows the autocorrelation of any given lag (of time series) against the current series, but with the contributions of the lags-inbetween removed.

Note: If you want to learn how to interpret and draw ACF and PACF plots, check this free video tutorial.
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Import Data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv')

# Draw Plot
fig,(ax1, ax2)= plt.subplots(1,2,figsize=(16,6), dpi=80)
plot_acf(df.traffic.tolist(), ax=ax1, lags=50)
plot_pacf(df.traffic.tolist(), ax=ax2, lags=20)

# Decorate
# lighten the borders
ax1.spines["top"].set_alpha(.3); ax2.spines["top"].set_alpha(.3)
ax1.spines["bottom"].set_alpha(.3); ax2.spines["bottom"].set_alpha(.3)
ax1.spines["right"].set_alpha(.3); ax2.spines["right"].set_alpha(.3)
ax1.spines["left"].set_alpha(.3); ax2.spines["left"].set_alpha(.3)

# font size of tick labels
ax1.tick_params(axis='both', labelsize=12)
ax2.tick_params(axis='both', labelsize=12)
plt.show()



38. Cross Correlation plot
Cross correlation plot shows the lags of two time series with each other.
import statsmodels.tsa.stattools as stattools

# Import Data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/mortality.csv')
x = df['mdeaths']
y = df['fdeaths']

# Compute Cross Correlations
ccs = stattools.ccf(x, y)[:100]
nlags = len(ccs)

# Compute the Significance level
# ref: <a href="https://stats.stackexchange.com/questions/3115/cross-correlation-significance-in-r/3128#3128" target="_blank">https://stats.stackexchange.com/questions/3115/cross-correlation-significance-in-r/3128#3128</a>
conf_level =2/ np.sqrt(nlags)

# Draw Plot
plt.figure(figsize=(12,7), dpi=80)

plt.hlines(0, xmin=0, xmax=100, color='gray')# 0 axis
plt.hlines(conf_level, xmin=0, xmax=100, color='gray')
plt.hlines(-conf_level, xmin=0, xmax=100, color='gray')

plt.bar(x=np.arange(len(ccs)), height=ccs, width=.3)

# Decoration
plt.title('$Cross\; Correlation\; Plot:\; mdeaths\; vs\; fdeaths$', fontsize=22)
plt.xlim(0,len(ccs))
plt.show()




39. Time Series Decomposition Plot
Time series decomposition plot shows the break down of the time series into trend, seasonal and residual components.
from statsmodels.tsa.seasonal import seasonal_decompose
from dateutil.parser import parse

# Import Data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/AirPassengers.csv')
dates = pd.DatetimeIndex(])
df.set_index(dates, inplace=True)

# Decompose
result = seasonal_decompose(df['traffic'], model='multiplicative')

# Plot
plt.rcParams.update({'figure.figsize':(10,10)})
result.plot().suptitle('Time Series Decomposition of Air Passengers')
plt.show()



40. Multiple Time Series
You can plot multiple time series that measures the same value on the same chart as shown below.
# Import Data
df = pd.read_csv('https://github.com/selva86/datasets/raw/master/mortality.csv')

# Define the upper limit, lower limit, interval of Y axis and colors
y_LL =100
y_UL = int(df.iloc[:,1:].max().max()*1.1)
y_interval =400
mycolors =['tab:red','tab:blue','tab:green','tab:orange']   

# Draw Plot and Annotate
fig, ax = plt.subplots(1,1,figsize=(16,9), dpi=80)   

columns = df.columns
for i, column in enumerate(columns):   
    plt.plot(df.date.values, df.values, lw=1.5, color=mycolors<i>)   
   </i> plt.text(df.shape+1, df.values[-1], column, fontsize=14, color=mycolors<span style="font-style: italic;"><span style="font-style: normal;">)

# Draw Tick lines
for y in range(y_LL, y_UL, y_interval):   
    plt.hlines(y, xmin=0, xmax=71, colors='black', alpha=0.3, linestyles="--", lw=0.5)

# Decorations   
plt.tick_params(axis="both", which="both", bottom=False, top=False,   
                labelbottom=True, left=False, right=False, labelleft=True)      

# Lighten borders
plt.gca().spines["top"].set_alpha(.3)
plt.gca().spines["bottom"].set_alpha(.3)
plt.gca().spines["right"].set_alpha(.3)
plt.gca().spines["left"].set_alpha(.3)

plt.title('Number of Deaths from Lung Diseases in the UK (1974-1979)', fontsize=22)
plt.yticks(range(y_LL, y_UL, y_interval),, fontsize=12)   
plt.xticks(range(0, df.shape,12), df.date.values[::12], horizontalalignment='left', fontsize=12)   
plt.ylim(y_LL, y_UL)   
plt.xlim(-2,80)   
plt.show()
</span></span>



文章来源于微信公众号:自学气象人



页: [1]
查看完整版本: 50个常用统计图表代码总结(8)