An example related to question on Twitter https://twitter.com/drewdaraabrams/status/1359933543619547137
Try curl https://entreprise.data.gouv.fr/api/sirene/v1/full_text/MONTPELLIERAIN
Among result, look at
"total_results": 161,
"total_pages": 17,
"per_page": 10,
"page": 1,
Then, try with option to get all in one curl https://entreprise.data.gouv.fr/api/sirene/v1/full_text/MONTPELLIERAIN?per_page=170
"total_results": 161,
"total_pages": 2,
"per_page": 100,
"page": 1,
We deduce the API paging max = 100
Let’s try to make a compromise to get 4 pages of 50 elements. We could do the same with paging 100 but it will made only 2 http calls. For demo, we lower it to make 4 http calls.
Get all pages manually just to see the pattern of URLs calls
Now change the recipe to automate
pageoffset=50
# Get result of page 1 to count for paging calls
result1=$(curl "https://entreprise.data.gouv.fr/api/sirene/v1/full_text/MONTPELLIERAIN?per_page="$pageoffset"&page=1")
# Prepare to be able to get/calculate pages and hence number of API calls and variables to use when looping through each page
tot_result=$(echo $result1|jq -r .total_results)
tot_page=$(echo $result1|jq -r .total_pages) # Here tot page is available. You may need to calculate if you only get total result
calculated_tot_page=$(echo "if ( $tot_result%$pageoffset ) $tot_result/$pageoffset+1 else $tot_result/$pageoffset" |bc)
# Take each page, get it and save as a separate file
for ((i=1;i<=tot_page;i++));
do curl -s "https://entreprise.data.gouv.fr/api/sirene/v1/full_text/MONTPELLIERAIN?per_page="$pageoffset"&page="$i | jq '.[0].etablissement[]' --slurp >| "/tmp/content"$i".json";
sleep 0.3; # Add delay of 0.3s to not be kicked due to rate limitations
done;
# Merge your JSONs "et voilà!" as we say in French
cat /tmp/content*.json | jq -s . >| /tmp/out.json