veoibd_synapse.data package

Submodules

veoibd_synapse.data.asset_intake module

Code supporting the information discovery and assimilation of data/file assets.

class veoibd_synapse.data.asset_intake.Row(path_hash, file_name, directory, batch_code, file_type, assay_type, bytes, subject_id)

Bases: tuple

__getnewargs__()

Return self as a plain tuple. Used by copy and pickle.

static __new__(_cls, path_hash, file_name, directory, batch_code, file_type, assay_type, bytes, subject_id)

Create new instance of Row(path_hash, file_name, directory, batch_code, file_type, assay_type, bytes, subject_id)

__repr__()

Return a nicely formatted representation string

_asdict()

Return a new OrderedDict which maps field names to their values.

classmethod _make(iterable, new=<built-in method __new__ of type object at 0x8743c0>, len=<built-in function len>)

Make a new Row object from a sequence or iterable

_replace(_self, **kwds)

Return a new Row object replacing specified fields with new values

assay_type

Alias for field number 5

batch_code

Alias for field number 3

bytes

Alias for field number 6

directory

Alias for field number 2

file_name

Alias for field number 1

file_type

Alias for field number 4

path_hash

Alias for field number 0

subject_id

Alias for field number 7

veoibd_synapse.data.asset_intake.build_asset_table(asset_conf, pathify=True)[source]

Return asset table as pd.DataFrame built from asset_conf info.

Column Discriptions:
  • path_hash (int)
  • file_name (str)
  • directory (str)
  • batch_code (Category)
    • Regeneron1, Merck1, Merck2, etc
  • file_type (Category)
    • BAM, VCF, GVCF, FASTQ, etc
  • assay_type (Category)
    • WES, WGS, RNAseq, etc
  • bytes (int)
  • subject_id (str)
Parameters:
  • asset_conf (dict-like) – configuration tree built from asset_intake configuration file.
  • pathify (bool) – whether or not to run pathify_assets() on the paths in asset_conf
Returns:

pd.DataFrame

veoibd_synapse.data.asset_intake.pathify_assets(FILE_TYPE)[source]

Converts the list of path glob patterns in the config file to list of Path objects.

In place conversion.

Parameters:FILE_TYPE (dict-like) – key=file type, val=list of path glob patterns
Returns:None

Module contents