pycognaize.document.snapshot_downloader.SnapshotDownloader

class SnapshotDownloader[source]

Bases: object

A class for downloading snapshots from an S3 bucket to a local destination.

Attributes:

PREFIX (str): The prefix used to indicate S3 paths, which is ‘s3://’.

Methods:
download(

snapshot_path, destination_path, exclude=None, continue_token=None): Downloads snapshots from an S3 bucket to a local destination.

Private Methods:
_get_parts_from_path(path):

Extracts the bucket name and path from an S3 path.

_init_s3_objects():

Initializes the S3 client and resource objects.

_get_page_iterator(bucket_name, continue_token, paginator, path_without_bucket):

Returns an iterator for paginating through S3 bucket contents.

_get_relogined_page_iterator(bucket_name, next_token, pagination_config, path_without_bucket):

Returns a relogged-in iterator for paginating through

S3 bucket contents.

_relogin():

Performs reauthentication for AWS credentials.

_copy_objects_from_page(bucket_name, exclude,
page, snapshot_path, destination_path):

Copies objects from an S3 page to the destination.

_copy_file_to_dest(s3_object, snapshot_path, destination_path):

Copies a single S3 object to the destination.

_write_file(path, file_data):

Writes file data to a local path.

_should_exclude(file_path, exclude):

Determines if a file path should be excluded from copying.

Methods

download

Downloads snapshots from an S3 bucket to a local destination.

Attributes

PREFIX

download(snapshot_path, destination_path, exclude=None, include=None, continue_token=None)[source]

Downloads snapshots from an S3 bucket to a local destination.

This method allows you to download snapshots from an S3 bucket to a

specified local destination

path. It supports optional exclusion patterns for excluding specific files and can continue downloading from a specified continuation token.

Args:

snapshot_path (str): The S3 path of the snapshot to download. destination_path (str): The local destination directory where

snapshots will be saved.

exclude (list, optional): A list of file path patterns to

exclude from copying.

include (list, optional): A list of file path patterns to

include even if excluded.

continue_token (str, optional): An optional continuation token for

resuming downloads.

Returns:

None

This method initiates the download of snapshots from the specified

S3 path to the local destination

path. It handles pagination and reauthentication as needed. Progress and status information is printed during the download process.

Note:
  • The ‘snapshot_path’ should start with ‘s3://’.

  • The ‘destination_path’ will be created if it doesn’t exist.

Raises:

ValueError: If ‘snapshot_path’ doesn’t start with ‘s3://’.

Parameters:
  • snapshot_path (str)

  • destination_path (str)

  • continue_token (str)