I have been using for quite sometime, but until now I had only used it sequentially. I had copied the open_mfdataset function to , but without properly trying out the parallel=True option.

Today I spawned some distributed workers using dask-mpi and there was a 3x speedup for loading a collection of 100 files using 6 cores.

I should try out dask-jobqueue in the clustef next.

Sign in to participate in the conversation
Mastodon @ SUNET

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!