pylateral¶
Simple multi-threaded task processing in python
Example¶
import urllib.request
import pylateral
@pylateral.task
def request_and_print(url):
    response = urllib.request.urlopen(url)
    print(response.read())
URLS = [
    "https://www.nytimes.com/",
    "https://www.cnn.com/",
    "https://europe.wsj.com/",
    "https://www.bbc.co.uk/",
    "https://some-made-up-domain.com/",
]
with pylateral.task_pool():
    for url in URLS:
        request_and_print(url)
print("Complete!")
What's going on here¶
- 
def request_and_print(url)is a pylateral task that, when called, is run on a task pool thread rather than on the main thread.
- 
with pylateral.task_pool()allocates threads and a task pool. The context manager may exit only when there are no remaining tasks.
- 
Each call to request_and_print(url)adds that task to the task pool. Meanwhile, the main thread continues execution.
- 
The Complete!statement is printed after all therequest_and_print()task invocations are complete by the pool threads.
To learn more about the features of pylateral, check out the usage section.
Background¶
A couple of years ago, I inherited my company's codebase to get data into our data warehouse using an ELT approach (extract-and-loads done in python, transforms done in dbt/SQL). The codebase has dozens of python scripts to integrate first-party and third-party data from databases, FTPs, and APIs, which are run on a scheduler (typically daily or hourly). The scripts I inherited were single-threaded procedural scripts, looking like glue code, and spending most of their time in network I/O. This got my company pretty far!
As my team and I added more and more integrations with more and more data, we wanted to have faster and faster scripts to reduce our dev cycles and reduce our multi-hour nightly jobs to minutes. Because our scripts were network-bound, multi-threading was a good way to accomplish this, and so I looked into concurrent.futures and asyncio, but I decided against these options because:
- 
It wasn't immediately apparently how to adapt my codebase to use these libraries without either some fundamental changes to our execution platform and/or reworking of our scripts from the ground up and/or adding significant lines of multi-threading code to each script. 
- 
I believe the procedural style glue code we have is quite easy to comprehend, which I think has a positive impact on the scale of supporting a wide-variety of programs. 
And so, I designed pylateral, a simple interface to concurrent.futures.ThreadPoolExecutor for extract-and-load workloads. The design considerations of this interface include:
- 
The usage is minimally-invasive to the original un-threaded approach of my company's codebase. (And so, teaching the library has been fairly straightforward despite the multi-threaded paradigm shift.) 
- 
The @pylateral.taskdecorator should be used to encapsulate a homogeneous method accepting different parameters. The contents of the method should be primarily I/O to achieve the concurrency gains of python multi-threading.
- 
If no pylateral.poolcontext manager has been entered, or if it has been disabled by an environment variable, the@pylateral.taskdecorator does nothing (and the code runs serially).
- 
While it's possible to return a value from a @pylateral.taskmethod, I encourage my team to use the decorator to start-and-complete work; think of writing "embarrassingly parallel" methods that can be "mapped".
Why not other libraries?¶
I think that pylateral meets an unmet need in python's concurrency eco-system: a simple way to gain the benefits of multi-threading without radically transforming either mindset or codebase.
That said, I don't think pylateral is a silver bullet. See my comparison of pylateral against other concurrency offerings.