Xarray was designed for working seamlessly with multidimensional data in Python.
Xarray works on top of raw NumPy-like multidimensional arrays, and introduces labels in the form of dimensions, coordinates, and attributes.
Xarray builds upon and integrates with NumPy and pandas.
The main aim of the project is to keep focus on functionality and better interfaces related to the labeled data, and leverage other Python libraries for what they do properly already. I.e., NumPy/pandas for arrays and indexing, Dask for parallel computing, matplotlib for plotting, etc.
The data model is borrowed from netCDF file format.
Link: https://docs.xarray.dev/en/stable/getting-started-guide/installing.html
The required dependencies to install xarray are:
There are several different optional dependencies. However, several might be useful to improve the performance of the xarray:
The set of recommended dependencies is: xarray dask netCDF4 bottleneck
.
Xarray offers multiple dependency sets:
xarray[io]
- I/O operationsxarray[accel]
- accelerating xarrayxarray[parallel]
- dask arraysxarray[viz]
- visualizationsxarray[complete]
- everything above??? Difference xarray DataArray vs. Dask array ??? Difference between how different formats are stored on disk (netcdf, zarr, grib, hdf5, geotiff)
Best practices for Dask on xarray