veoibd_synapse.data package¶
Subpackages¶
Submodules¶
veoibd_synapse.data.asset_intake module¶
Code supporting the information discovery and assimilation of data/file assets.
-
class
veoibd_synapse.data.asset_intake.
Row
(path_hash, file_name, directory, batch_code, file_type, assay_type, bytes, subject_id)¶ Bases:
tuple
-
__getnewargs__
()¶ Return self as a plain tuple. Used by copy and pickle.
-
static
__new__
(_cls, path_hash, file_name, directory, batch_code, file_type, assay_type, bytes, subject_id)¶ Create new instance of Row(path_hash, file_name, directory, batch_code, file_type, assay_type, bytes, subject_id)
-
__repr__
()¶ Return a nicely formatted representation string
-
_asdict
()¶ Return a new OrderedDict which maps field names to their values.
-
classmethod
_make
(iterable, new=<built-in method __new__ of type object at 0x8743c0>, len=<built-in function len>)¶ Make a new Row object from a sequence or iterable
-
_replace
(_self, **kwds)¶ Return a new Row object replacing specified fields with new values
-
assay_type
¶ Alias for field number 5
-
batch_code
¶ Alias for field number 3
-
bytes
¶ Alias for field number 6
-
directory
¶ Alias for field number 2
-
file_name
¶ Alias for field number 1
-
file_type
¶ Alias for field number 4
-
path_hash
¶ Alias for field number 0
-
subject_id
¶ Alias for field number 7
-
-
veoibd_synapse.data.asset_intake.
build_asset_table
(asset_conf, pathify=True)[source]¶ Return asset table as
pd.DataFrame
built fromasset_conf
info.- Column Discriptions:
- path_hash (int)
- file_name (str)
- directory (str)
- batch_code (Category)
- Regeneron1, Merck1, Merck2, etc
- file_type (Category)
- BAM, VCF, GVCF, FASTQ, etc
- assay_type (Category)
- WES, WGS, RNAseq, etc
- bytes (int)
- subject_id (str)
Parameters: - asset_conf (
dict
-like) – configuration tree built from asset_intake configuration file. - pathify (
bool
) – whether or not to runpathify_assets()
on the paths inasset_conf
Returns: pd.DataFrame