block by palewire 7875b515cf48d69922933b7897b61566

Python Parallelization Example

Python Parallelization Example

This script demonstrates how to upload multiple PDF documents to DocumentCloud in parallel across your CPU cores using Python. It reads a list of PDF URLs from a CSV file, uploads each document to DocumentCloud, and saves the resulting document IDs back to the CSV.

It was created to demonstrate the use of the parallelize function for concurrent processing, which can significantly speed up tasks. In practice, our data team at Reuters uses this function to spread tens of thousands of operations across dozens of computer cores.

A more basic example would be something like the following:

from main import parallelize

def your_function(item):
    # Process the item
    return item * 2

items = [1, 2, 3, 4, 5]

results = parallelize(items, your_function)
print(results)  # Output: [2, 4, 6, 8, 10]

main.py

pdfs.csv

pyproject.toml