Performing an Initial Upload

When you begin using Identity, you will usually want to upload all existing patients into the system. For large systems, potentially with millions of patients, this can take time. This article provides tips to streamline the process.

Contents

Parallelism
Using a Thread Pool
Optimizing Throughput

Parallelism

For large batches of patients, uploading records one at a time will take a while.

For example, assume:

50M records
250ms / upload (this is an illustrative example; actual rates will vary)

It would take almost 145 days to upload all the records!

Instead, you’ll want to submit multiple requests at the same time, i.e., in parallel.

Now the total time is only 500ms, plus a little bit for processing overhead.

With proper optimization, the Identity API can handle up to 8 million parallel uploads in a day, cutting the time for the initial load down to about 12 days.

It is important, however, that you not overwhelm the system with too many parallel uploads at once. This can paradoxically cause the system to respond more slowly, and can even result in a too many requests error.

Using a Thread Pool

There are many ways to execute parallel requests, but one common design pattern is a thread pool. In this pattern, your application will manage a “pool” of worker threads/processes, each capable of running a task (such as uploading a record) in parallel. In the parallelization example shown above, there were four groups going in parallel, which would require four workers.

When a worker is free, the application will give it the next record to upload.

This continues until all tasks have been processed.

By maintaining a stable pool of workers, the application can avoid the overhead of constantly creating and cleaning up threads. You can also adjust the pool size to control the number of concurrent requests.

Many programming languages contain built-in utilities or supplemental packages to manage the thread pool. For example:

Python
C#

    Thread Pool (Python)
    
  

    from concurrent.futures import ThreadPoolExecutor

    with ThreadPoolExecutor(max_workers=4) as executor:
        for record in records_list:
            executor.submit(upload_record, record)
      from concurrent.futures import ThreadPoolExecutor

    with ThreadPoolExecutor(max_workers=4) as executor:
        for record in records_list:
            executor.submit(upload_record, record)

    Thread Pool (C#)
    
  

    using System.Threading;

    ThreadPool.SetMaxThreads(4, 4);
    foreach (var record in recordsList) {
        ThreadPool.QueueUserWorkItem(new WaitCallback(UploadRecord), record);
    }
      using System.Threading;

    ThreadPool.SetMaxThreads(4, 4);
    foreach (var record in recordsList) {
        ThreadPool.QueueUserWorkItem(new WaitCallback(UploadRecord), record);
    }

Tip
These are just basic examples for illustration. To implement this in a real system, you will want a more robust design (or possibly a third-party library) that incorporates error handling, retries, etc. Consult the documentation for your selected programming language, such as Launching Parallel Tasks (Python) or Managed Thread Pools (.NET).

Optimizing Throughput

Once you have the basics of the thread pool established, you can optimize it to maximize throughput. Finding the optimum rate will require some experimentation; there is not a precise formula to follow. In general, you can control:

Number of worker threads
Periodic delays

If you send too many requests, you’ll swamp the system, risking “too many requests” errors or slower processing.

We recommend that you not exceed 100 concurrent requests. For more information, see Maximizing Throughput.