Repository

class wrgl.repository.Repository(repo_uri, client_id, client_secret=None)

Represents the HTTP API that wraps a hosted Wrgl repository

Parameters
  • repo_uri (str) – the URI of the repository

  • client_id (str) – Keycloak client id

  • client_secret (str) – Keycloak client secret

get_refs()

Get references as a mapping of reference name and commit checksum

Return type

dict

get_branch(branch)

Get the head commit of a branch

Parameters

branch (str) – the name of the branch

Return type

Commit

authenticate()

Exchanges client id and secret for an rpt

commit(branch, message, file, primary_key)

Creates a new commit

Parameters
  • branch (str) – name of the branch to commit under

  • message (str) – commit message

  • file (typing.BinaryIO) – the CSV file to commit

  • primary_key (list[str]) – list of column names that make up the primary key

Return type

CommitResult

get_commit_tree(head, max_depth)

Gets commit tree

Parameters
  • head (str) – name of the root commit, could either be reference name or commit checksum.

  • max_depth (int) – maximum depth of commit tree to fetch

Return type

CommitTree

get_commit(commit_sum)

Get commit with the given checksum

Parameters

commit_sum (str) – commit checksum

Return type

Commit

get_table(table_sum)

Get table with the given checksum

Parameters

table_sum (str) – table checksum

Return type

Table

get_blocks(commit, start=None, end=None, with_column_names=True)

Fetchs blocks as concatenated rows. Each row as a list of strings.

Calling this with default start, end, and with_column_names will return the entire table.

Parameters
  • commit (str) – either commit checksum or reference e.g. “heads/main”

  • start (int) – index of the first block to fetch. Defaults to 0.

  • end (int) – index of the last block to fetch. If not set, fetch til the end.

  • with_column_names (bool) – prepend column names to the resulting CSV, which in effect producing a CSV with header.

Return type

typing.Iterator[list[str]]

get_table_blocks(table_sum, start=None, end=None, with_column_names=True)

Fetchs blocks with table checksum.

Calling this with default start, end, and with_column_names will return the entire table.

Parameters
  • table_sum (str) – table checksum

  • start (int) – index of the first block to fetch. Defaults to 0.

  • end (int) – index of the last block to fetch. If not set, fetch til the end.

  • with_column_names (bool) – prepend column names to the resulting CSV, which in effect producing a CSV with header.

Return type

typing.Iterator[list[str]]

get_rows(commit, offsets)

Get rows at certain offsets. Each row will be returned as a list of strings.

This is usually used in tandem with row offsets from DiffResult to fetch changed rows.

Parameters
  • commit (str) – either commit checksum or reference e.g. “heads/main”

  • offsets (list[int]) – the offsets of the rows to fetch

Return type

typing.Iterator[list[str]]

get_table_rows(table_sum, offsets)

Get rows at certain offsets with table checksum.

This is usually used in tandem with row offsets from DiffResult to fetch changed rows.

Parameters
  • table_sum (str) – table checksum

  • offsets (list[int]) – the offsets of the rows to fetch

Return type

typing.Iterator[list[str]]

diff(sum1, sum2)

Compares two commits and returns their differences.

Parameters
  • sum1 (str) – checksum of the first commit

  • sum2 (str) – checksum of the second commit

Return type

DiffResult

diff_reader(sum1, sum2, fetch_size=100)

Compares two commits and interpret their differences.

This method is higher level than Repository.diff() and should be preferred for 99% of use cases.

Parameters
  • sum1 (str) – checksum of the first commit

  • sum2 (str) – checksum of the second commit

Return type

DiffReader