MHCXGraph.utils.analysis¶

MHCXGraph.utils.analysis.chain_signature(node_tuple: tuple) → str[source]¶

Compute a chain signature for a tuple of node labels.

Parameters:: node_tuple (tuple) – Tuple of residue labels for an associated node.
Returns:: Concatenated chain identifiers.
Return type:: str

MHCXGraph.utils.analysis.evaluate_all_frames_nodes(json_path: Path) → tuple[pandas.DataFrame, pandas.DataFrame][source]¶

Evaluate node coverage for all components and frames from a JSON file.

Parameters:

json_path (Path) – Path to the JSON produced by _make_json_from_associated_graph.

Returns:

df_fp_nodes (pandas.DataFrame) – Per frame and per protein metrics.
df_frames_nodes (pandas.DataFrame) – Frame level aggregated metrics.

MHCXGraph.utils.analysis.evaluate_all_frames_nodes_weighted(json_path: Path) → tuple[pandas.DataFrame, pandas.DataFrame][source]¶

Evaluate weighted node coverage summaries for all frames.

Parameters:

json_path (Path) – Path to the JSON produced by _make_json_from_associated_graph.

Returns:

df_fp_nodes (pandas.DataFrame) – Per frame and per protein coverage metrics.
df_frames_nodes_w (pandas.DataFrame) – Weighted summaries per frame.

MHCXGraph.utils.analysis.evaluate_frame_nodes(component_id: Any, frame_id: Any, data: dict[str, Any]) → tuple[pandas.DataFrame, dict[str, Any]][source]¶

Evaluate node coverage metrics for one frame across all proteins.

Parameters:

component_id (hashable) – Component identifier.
frame_id (hashable) – Frame identifier.
data (dict) – JSON payload as built by _make_json_from_associated_graph.

Returns:

df (pandas.DataFrame) – Per protein metrics for this frame.
summary (dict) – Aggregated summary for the frame.

MHCXGraph.utils.analysis.get_protein_keys(original_graphs: dict[Any, Any]) → list[str][source]¶

Return a list of protein keys sorted numerically if keys are numeric strings.

Parameters:: original_graphs (dict) – Mapping from internal id to original graph data.
Returns:: Ordered keys for proteins.
Return type:: list of str

MHCXGraph.utils.analysis.ivw_mean_proportions(cov, n)[source]¶

Inverse variance weighted mean for proportions with shrinkage.

Parameters:

cov (array_like) – Coverage values between 0 and 1.
n (array_like) – Sample sizes.

Returns:

Weighted mean proportion estimate.

Return type:

float

MHCXGraph.utils.analysis.node_similarity_for_protein(frame: dict[str, Any], original_graphs: dict[str, Any], protein_keys: list[str], p: int) → dict[str, Any] | None[source]¶

Compute node coverage metrics for a single protein in one frame.

Parameters:

frame (dict) – Frame entry from the JSON payload built from AssociatedGraph.
original_graphs (dict) – Mapping from protein key to original graph data.
protein_keys (list of str) – Ordered protein keys.
p (int) – Index of the protein to evaluate.

Returns:

Coverage metrics for this protein and frame, or None if there are no nodes.

Return type:

dict or None

MHCXGraph.utils.analysis.project_nodes_instances(frame_nodes: list[Any], p: int) → list[str][source]¶

Project associated nodes onto the p-th protein.

Parameters:

frame_nodes (list) – List of associated graph nodes as tuples of residue labels.
p (int) – Index of the protein to project.

Returns:

Residue labels for protein p.

Return type:

list of str

MHCXGraph.utils.analysis.summarize_frame_nodes(df_fp_nodes_for_frame: pandas.DataFrame) → dict[str, Any][source]¶

Compute weighted summaries for node coverage across proteins in a frame.

Parameters:: df_fp_nodes_for_frame (pandas.DataFrame) – Per protein node coverage for one frame.
Returns:: Summary statistics including weighted mean, median and dispersion.
Return type:: dict

MHCXGraph.utils.analysis.unique_chain_signatures(frame_nodes: list[tuple]) → list[str][source]¶

Compute sorted unique chain signatures for all nodes in a frame.

Parameters:: frame_nodes (list of tuple) – List of associated nodes as tuples of residue labels.
Returns:: Sorted unique chain signatures.
Return type:: list of str

MHCXGraph.utils.analysis.wmean(x, w)[source]¶

Weighted mean.

Parameters:

x (array_like) – Data values.
w (array_like) – Weights.

Returns:

Weighted mean.

Return type:

float

MHCXGraph.utils.analysis.wmedian(x, w)[source]¶

Weighted median.

Parameters:

x (array_like) – Data values.
w (array_like) – Weights.

Returns:

Weighted median.

Return type:

float

MHCXGraph.utils.analysis.wstd(x, w)[source]¶

Weighted standard deviation.

Parameters:

x (array_like) – Data values.
w (array_like) – Weights.

Returns:

Weighted standard deviation.

Return type:

float

MHCXGraph.utils.analysis.wtrimmed_mean(x, w, trim=0.1)[source]¶

Weighted trimmed mean removing tails.

Parameters:

x (array_like) – Data values.
w (array_like) – Weights.
trim (float, optional) – Fraction to trim at each tail, by default 0.10.

Returns:

Weighted trimmed mean.

Return type:

float

Functions¶

`chain_signature`(node_tuple)	Compute a chain signature for a tuple of node labels.
`evaluate_all_frames_nodes`(json_path)	Evaluate node coverage for all components and frames from a JSON file.
`evaluate_all_frames_nodes_weighted`(json_path)	Evaluate weighted node coverage summaries for all frames.
`evaluate_frame_nodes`(component_id, frame_id, ...)	Evaluate node coverage metrics for one frame across all proteins.
`get_protein_keys`(original_graphs)	Return a list of protein keys sorted numerically if keys are numeric strings.
`ivw_mean_proportions`(cov, n)	Inverse variance weighted mean for proportions with shrinkage.
`node_similarity_for_protein`(frame, ...)	Compute node coverage metrics for a single protein in one frame.
`project_nodes_instances`(frame_nodes, p)	Project associated nodes onto the p-th protein.
`summarize_frame_nodes`(df_fp_nodes_for_frame)	Compute weighted summaries for node coverage across proteins in a frame.
`unique_chain_signatures`(frame_nodes)	Compute sorted unique chain signatures for all nodes in a frame.
`wmean`(x, w)	Weighted mean.
`wmedian`(x, w)	Weighted median.
`wstd`(x, w)	Weighted standard deviation.
`wtrimmed_mean`(x, w[, trim])	Weighted trimmed mean removing tails.