capstone.utils.downloader

`handle_download(url, dest_path, num_threads=4)`

Downloads a file from the specified URL to the given destination path.

The function attempts to download the file using parallel chunking if the server supports byte-range requests and the file size is known. Otherwise, it falls back to a single-threaded download. The number of parallel threads can be specified.

If the download fails, the partially downloaded file is removed and the error is logged.

Parameters:

Name	Type	Description	Default
`url`	`str`	The URL of the file to download.	required
`dest_path`	`Path`	The local path where the downloaded file will be saved.	required
`num_threads`	`int`	Number of parallel threads to use for downloading. Defaults to 4.	`4`

Raises:

Type	Description
`Exception`	If the download fails for any reason other than KeyboardInterrupt.

Notes

In testing, more than 4 threads caused 503 errors on some servers.

Source code in capstone/utils/downloader.py

def handle_download(url: str, dest_path: Path, num_threads: int = 4) -> None:
    """
    Downloads a file from the specified URL to the given destination path.

    The function attempts to download the file using parallel chunking if the server supports
    byte-range requests and the file size is known. Otherwise, it falls back to a single-threaded
    download. The number of parallel threads can be specified.

    If the download fails, the partially downloaded file is removed and the error is logged.

    Args:
        url (str): The URL of the file to download.
        dest_path (Path): The local path where the downloaded file will be saved.
        num_threads (int, optional): Number of parallel threads to use for downloading. Defaults to 4.

    Raises:
        Exception: If the download fails for any reason other than KeyboardInterrupt.

    Notes:
        - In testing, more than 4 threads caused 503 errors on some servers.
    """
    try:
        # fetch headers with HEAD request to check for byte-range support and get file size
        _head = head(url, allow_redirects=True)
        _head.raise_for_status()

        size_str: str = _head.headers.get("Content-Length", "0")
        size: int = int(size_str)
        accept_ranges: str = _head.headers.get("Accept-Ranges", "none")

        # if we can use parallel downloading, do so
        if accept_ranges.lower() == "bytes" and size > 0 and num_threads > 1:
            _parallel_downloader(url, dest_path, size, num_threads)
        # otherwise fall back to single connection download
        else:
            _single_downloader(url, dest_path, size)

        LOGGER.info(f"Download complete: {dest_path}")

    except Exception as e:
        if isinstance(e, KeyboardInterrupt):
            return
        LOGGER.error(f"Download failed: {e}")
        if dest_path.exists():
            dest_path.unlink()
        raise