block by ThomasG77 a312659ca119460920c6d41d2027a4b2

Recipe to get JSON using pagination using command line tools e.g curl, jq, bc, cat

curl + jq (with slurp option) + bash loop do the job

An example related to question on Twitter https://twitter.com/drewdaraabrams/status/1359933543619547137

Try curl https://entreprise.data.gouv.fr/api/sirene/v1/full_text/MONTPELLIERAIN

Among result, look at

"total_results": 161,
"total_pages": 17,
"per_page": 10,
"page": 1,

Then, try with option to get all in one curl https://entreprise.data.gouv.fr/api/sirene/v1/full_text/MONTPELLIERAIN?per_page=170

"total_results": 161,
"total_pages": 2,
"per_page": 100,
"page": 1,

We deduce the API paging max = 100

Let’s try to make a compromise to get 4 pages of 50 elements. We could do the same with paging 100 but it will made only 2 http calls. For demo, we lower it to make 4 http calls.

Get all pages manually just to see the pattern of URLs calls

Now change the recipe to automate

pageoffset=50
# Get result of page 1 to count for paging calls
result1=$(curl "https://entreprise.data.gouv.fr/api/sirene/v1/full_text/MONTPELLIERAIN?per_page="$pageoffset"&page=1")

# Prepare to be able to get/calculate pages and hence number of API calls and variables to use when looping through each page
tot_result=$(echo $result1|jq -r .total_results)
tot_page=$(echo $result1|jq -r .total_pages) # Here tot page is available. You may need to calculate if you only get total result
calculated_tot_page=$(echo "if ( $tot_result%$pageoffset ) $tot_result/$pageoffset+1 else $tot_result/$pageoffset" |bc)

# Take each page, get it and save as a separate file
for ((i=1;i<=tot_page;i++));
    do curl -s "https://entreprise.data.gouv.fr/api/sirene/v1/full_text/MONTPELLIERAIN?per_page="$pageoffset"&page="$i | jq '.[0].etablissement[]' --slurp >| "/tmp/content"$i".json";
       sleep 0.3; # Add delay of 0.3s to not be kicked due to rate limitations
done;

# Merge your JSONs "et voilà!" as we say in French
cat /tmp/content*.json | jq -s . >| /tmp/out.json