气ython风雨 发表于 2024-3-25 16:57:35

xarray+eofs:对EOF的简单实现


昨日有读者说想看看EOF和小波分析,近期也在搞xarray的推文,就拿xarray的数据直接做了。

小波分析的话其他博主已经写过pycwt库的实现了,就直接转载了(偷懒)

本想写几种eof库比较的,在pyeof和xeof安装都失败后还是简单写一下eofs

导入库
In :
from eofs.standard import Eof
import xarray as xr
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


def get_time_space(df, time_dim, lumped_space_dims):
    """
    Get a DataFrame with the dim:

    Parameters
    ----------

    df : pandas.core.frame.DataFrame
      | time | lat | lon | PM2.5 |
      | 2011 | 66| 88| 22    |
      | 2011 | 66| 99| 23    |
      | 2012 | 66| 88| 24    |
      | 2012 | 66| 99| 25    |
      

    time_dim:
      time dim name (str), e.g., "time"
   
    lumped_space_dims:
      a list of the lumped space, e.g., ["lat","lon"]

    Returns
    -------
    dataframe:
      with , e.g.,
      | time | loc1    | loc2    |
      |      | (66,88) | (66,99) |
      | 2011 | 22      | 24      |
      | 2012 | 23      | 25      |
   
    """
    return df.set_index(+lumped_space_dims).unstack(lumped_space_dims)
数据预处理
In :
import xarray as xr

# 数据
f = '/home/mw/input/cru3546/cru_ts4.07.2021.2022.pre.dat.nc'

# 打开数据集
ds = xr.open_dataset(f)
ds = ds.sel(lat=slice(20, 60), lon=slice(80, 130))
ds

Out:



In :
df = ds.to_dataframe().reset_index() # get df from da
df=df.drop(columns=['stn'])
df
Out:


192000 rows × 4 columns

In :
df_data = get_time_space(df, time_dim = "time", lumped_space_dims = ["lat","lon"])
display(df_data.head(5))
print("DataFrame Shape:",df_data.shape)



5 rows × 8000 columns

DataFrame Shape: (24, 8000)

简单绘图函数
In :import matplotlib.pyplot as plt

def visualization(da, pcs, eofs_da, evf):
    n = len(pcs)# 获取模态数
    fig, axs = plt.subplots(n+1, 2, figsize=(20, 6*(n+1)))

    da.mean(dim=["lat","lon"]).plot(ax=axs)
    axs.set_title("average pre")
    da.mean(dim="time").plot(ax=axs)
    axs.set_title("average pre")

    for i in range(1, n+1):
      pc_i = xr.DataArray(pcs[:,i-1],dims=["time"])
      pc_i = pc_i.assign_coords(time=ds.time)
      eof_i = eofs_da.sel(mode=i-1)
      frac = str((evf * 100).round(2))

      axs.plot(pc_i)
      axs.set_title("PC"+str(i)+" ("+frac+"%)")

      img = eof_i.plot(x='lon',y='lat',ax=axs, vmin=-0.75, vmax=0.75, cmap="RdBu_r", add_colorbar=False)
      axs.set_title("EOF"+str(i)+" ("+frac+"%)")
      fig.colorbar(img, ax=axs)

    plt.tight_layout()
    plt.show()

eof计算与简单绘图
In :
from eofs.standard import Eof
from sklearn.preprocessing import StandardScaler
n=4
# 使用StandardScaler对数据进行标准化处理
scaler = StandardScaler()
df_data_scaled = scaler.fit_transform(df_data.values)

# 创建EOFs对象
solver = Eof(df_data_scaled)

# 计算主分量分数(PCs)
s_pcs = solver.pcs(npcs=4, pcscaling=2)

# 计算模态函数(EOFs)
s_eofs =solver.eofs(neofs=4, eofscaling=2)

s_eofs = s_eofs.reshape(4,100,80)

# 创建xarray数组
lon = ds.lon
lat = ds.lat
mode = np.arange(0,n)
s_eofs_ds = xr.DataArray(s_eofs,coords=, dims=["mode", "lon", "lat"])
# 计算解释方差比例
s_evfs = solver.varianceFraction(neigs=4)


visualization(ds['pre'], s_pcs, s_eofs_ds, s_evfs)/opt/conda/lib/python3.7/site-packages/sklearn/utils/extmath.py:770: RuntimeWarning: invalid value encountered in true_divide
updated_mean = (last_sum + new_sum) / updated_sample_count
/opt/conda/lib/python3.7/site-packages/sklearn/utils/extmath.py:709: RuntimeWarning: Degrees of freedom <= 0 for slice.
result = op(x, *args, **kwargs, dtype=np.float64)


带权重eof
In :
coslat = np.array(np.sqrt(np.cos(np.deg2rad(ds.lat)))).reshape((1,80))


# 创建EOFs对象
solver1 = Eof(df_data_scaled.reshape(24,100,80),weights = coslat)

# 计算主分量分数(PCs)
s_pcs1 = solver1.pcs(npcs=4, pcscaling=2)

# 计算模态函数(EOFs)
s_eofs1 =solver.eofs(neofs=4, eofscaling=2)

s_eofs1 = s_eofs.reshape(4,100,80)

# 创建xarray数组
lon = ds.lon
lat = ds.lat
mode = np.arange(0,n)
s_eofs_ds1 = xr.DataArray(s_eofs1,coords=, dims=["mode", "lon", "lat"])
# 计算解释方差比例
s_evfs1 = solver.varianceFraction(neigs=4)


visualization(ds['pre'], s_pcs1, s_eofs_ds1, s_evfs1)



权重和不加权重看起来差异不大(甚至可以说没有),也许是数据量不够大,可自行测试别的长时间序列数据

文章来源于微信公众号:气ython风雨




页: [1]
查看完整版本: xarray+eofs:对EOF的简单实现