Skip to content

capstone.utils.downloader

handle_download(url, dest_path, num_threads=4)

Downloads a file from the specified URL to the given destination path.

The function attempts to download the file using parallel chunking if the server supports byte-range requests and the file size is known. Otherwise, it falls back to a single-threaded download. The number of parallel threads can be specified.

If the download fails, the partially downloaded file is removed and the error is logged.

Parameters:

Name Type Description Default
url str

The URL of the file to download.

required
dest_path Path

The local path where the downloaded file will be saved.

required
num_threads int

Number of parallel threads to use for downloading. Defaults to 4.

4

Raises:

Type Description
Exception

If the download fails for any reason other than KeyboardInterrupt.

Notes
  • In testing, more than 4 threads caused 503 errors on some servers.
Source code in capstone/utils/downloader.py
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
def handle_download(url: str, dest_path: Path, num_threads: int = 4) -> None:
    """
    Downloads a file from the specified URL to the given destination path.

    The function attempts to download the file using parallel chunking if the server supports
    byte-range requests and the file size is known. Otherwise, it falls back to a single-threaded
    download. The number of parallel threads can be specified.

    If the download fails, the partially downloaded file is removed and the error is logged.

    Args:
        url (str): The URL of the file to download.
        dest_path (Path): The local path where the downloaded file will be saved.
        num_threads (int, optional): Number of parallel threads to use for downloading. Defaults to 4.

    Raises:
        Exception: If the download fails for any reason other than KeyboardInterrupt.

    Notes:
        - In testing, more than 4 threads caused 503 errors on some servers.
    """
    try:
        # fetch headers with HEAD request to check for byte-range support and get file size
        _head = head(url, allow_redirects=True)
        _head.raise_for_status()

        size_str: str = _head.headers.get("Content-Length", "0")
        size: int = int(size_str)
        accept_ranges: str = _head.headers.get("Accept-Ranges", "none")

        # if we can use parallel downloading, do so
        if accept_ranges.lower() == "bytes" and size > 0 and num_threads > 1:
            _parallel_downloader(url, dest_path, size, num_threads)
        # otherwise fall back to single connection download
        else:
            _single_downloader(url, dest_path, size)

        LOGGER.info(f"Download complete: {dest_path}")

    except Exception as e:
        if isinstance(e, KeyboardInterrupt):
            return
        LOGGER.error(f"Download failed: {e}")
        if dest_path.exists():
            dest_path.unlink()
        raise