block by ThomasG77 d3e3453fe15aa3080356bc1a3e1606c2

IGN 7z vs zip with some parquet experiments

7z vs zip avec données IGN

Nous avons compressé le fichier 7z

wget https://data.geopf.fr/telechargement/download/ADMIN-EXPRESS-COG/ADMIN-EXPRESS-COG_3-1__SHP_WGS84G_FRA_2023-05-03/ADMIN-EXPRESS-COG_3-1__SHP_WGS84G_FRA_2023-05-03.7z
7z x ADMIN-EXPRESS-COG_3-1__SHP_WGS84G_FRA_2023-05-03.7z
zip -r ADMIN-EXPRESS-COG_3-1__SHP_WGS84G_FRA_2023-05-03.zip ADMIN-EXPRESS-COG_3-1__SHP_WGS84G_FRA_2023-05-03

Comparaison des tailles compressées

562M    ADMIN-EXPRESS-COG_3-1__SHP_WGS84G_FRA_2023-05-03.7z
693M    ADMIN-EXPRESS-COG_3-1__SHP_WGS84G_FRA_2023-05-03.zip

Test 7z

time ogrinfo /vsi7z/vsicurl/https://labs.webgeodatavore.com/partage/ADMIN-EXPRESS-COG_3-1__SHP_WGS84G_FRA_2023-05-03.7z/ADMIN-EXPRESS-COG_3-1__SHP_WGS84G_FRA_2023-05-03/ADMIN-EXPRESS-COG/1_DONNEES_LIVRAISON_2023-05-03/ADECOG_3-1_SHP_WGS84G_FRA/EPCI.shp
INFO: Open of `/vsi7z/vsicurl/https://labs.webgeodatavore.com/partage/ADMIN-EXPRESS-COG_3-1__SHP_WGS84G_FRA_2023-05-03.7z/ADMIN-EXPRESS-COG_3-1__SHP_WGS84G_FRA_2023-05-03/ADMIN-EXPRESS-COG/1_DONNEES_LIVRAISON_2023-05-03/ADECOG_3-1_SHP_WGS84G_FRA/EPCI.shp'
      using driver `ESRI Shapefile' successful.
1: EPCI (Polygon)

real    0m36,717s
user    0m30,227s
sys    0m0,529s

Test zip

Pour lister un zip à distance

# https://github.com/marcograss/partialzip
partialzip list https://labs.webgeodatavore.com/partage/ADMIN-EXPRESS-COG_3-1__SHP_WGS84G_FRA_2023-05-03.zip
time ogrinfo /vsizip/vsicurl/https://labs.webgeodatavore.com/partage/ADMIN-EXPRESS-COG_3-1__SHP_WGS84G_FRA_2023-05-03.zip/ADMIN-EXPRESS-COG_3-1__SHP_WGS84G_FRA_2023-05-03/ADMIN-EXPRESS-COG/1_DONNEES_LIVRAISON_2023-05-03/ADECOG_3-1_SHP_WGS84G_FRA/EPCI.shp
INFO: Open of `/vsizip/vsicurl/https://labs.webgeodatavore.com/partage/ADMIN-EXPRESS-COG_3-1__SHP_WGS84G_FRA_2023-05-03.zip/ADMIN-EXPRESS-COG_3-1__SHP_WGS84G_FRA_2023-05-03/ADMIN-EXPRESS-COG/1_DONNEES_LIVRAISON_2023-05-03/ADECOG_3-1_SHP_WGS84G_FRA/EPCI.shp'
      using driver `ESRI Shapefile' successful.
1: EPCI (Polygon)

real    0m0,504s
user    0m0,327s
sys    0m0,017s

Bilan:

Bonus: expérimentation GéoParquet

for i in $(ls ADMIN-EXPRESS-COG_*/ADMIN-EXPRESS-COG/1_*/ADECOG_*/*.shp);
  do filename=$(basename $i);
     filenamenoext=${filename%.*};
     ogr2ogr -f Parquet demo_$(echo $filenamenoext | tr '[:upper:]' '[:lower:]').parquet $i
done;
# scp *.parquet alias_ssh:/mypath/partage/
time ogrinfo /vsizip/vsicurl/https://labs.webgeodatavore.com/partage/epci.parquet
INFO: Open of `/vsicurl/https://labs.webgeodatavore.com/partage/epci.parquet'
      using driver `Parquet' successful.
1: epci (Multi Polygon)

real    0m0,338s
user    0m0,176s
sys    0m0,030s