MHCXGraph.utils.analysis

MHCXGraph.utils.analysis.chain_signature(node_tuple: tuple) str[source]

Compute a chain signature for a tuple of node labels.

Parameters:

node_tuple (tuple) – Tuple of residue labels for an associated node.

Returns:

Concatenated chain identifiers.

Return type:

str

MHCXGraph.utils.analysis.evaluate_all_frames_nodes(json_path: Path) tuple[pandas.DataFrame, pandas.DataFrame][source]

Evaluate node coverage for all components and frames from a JSON file.

Parameters:

json_path (Path) – Path to the JSON produced by _make_json_from_associated_graph.

Returns:

  • df_fp_nodes (pandas.DataFrame) – Per frame and per protein metrics.

  • df_frames_nodes (pandas.DataFrame) – Frame level aggregated metrics.

MHCXGraph.utils.analysis.evaluate_all_frames_nodes_weighted(json_path: Path) tuple[pandas.DataFrame, pandas.DataFrame][source]

Evaluate weighted node coverage summaries for all frames.

Parameters:

json_path (Path) – Path to the JSON produced by _make_json_from_associated_graph.

Returns:

  • df_fp_nodes (pandas.DataFrame) – Per frame and per protein coverage metrics.

  • df_frames_nodes_w (pandas.DataFrame) – Weighted summaries per frame.

MHCXGraph.utils.analysis.evaluate_frame_nodes(component_id: Any, frame_id: Any, data: dict[str, Any]) tuple[pandas.DataFrame, dict[str, Any]][source]

Evaluate node coverage metrics for one frame across all proteins.

Parameters:
  • component_id (hashable) – Component identifier.

  • frame_id (hashable) – Frame identifier.

  • data (dict) – JSON payload as built by _make_json_from_associated_graph.

Returns:

  • df (pandas.DataFrame) – Per protein metrics for this frame.

  • summary (dict) – Aggregated summary for the frame.

MHCXGraph.utils.analysis.get_protein_keys(original_graphs: dict[Any, Any]) list[str][source]

Return a list of protein keys sorted numerically if keys are numeric strings.

Parameters:

original_graphs (dict) – Mapping from internal id to original graph data.

Returns:

Ordered keys for proteins.

Return type:

list of str

MHCXGraph.utils.analysis.ivw_mean_proportions(cov, n)[source]

Inverse variance weighted mean for proportions with shrinkage.

Parameters:
  • cov (array_like) – Coverage values between 0 and 1.

  • n (array_like) – Sample sizes.

Returns:

Weighted mean proportion estimate.

Return type:

float

MHCXGraph.utils.analysis.node_similarity_for_protein(frame: dict[str, Any], original_graphs: dict[str, Any], protein_keys: list[str], p: int) dict[str, Any] | None[source]

Compute node coverage metrics for a single protein in one frame.

Parameters:
  • frame (dict) – Frame entry from the JSON payload built from AssociatedGraph.

  • original_graphs (dict) – Mapping from protein key to original graph data.

  • protein_keys (list of str) – Ordered protein keys.

  • p (int) – Index of the protein to evaluate.

Returns:

Coverage metrics for this protein and frame, or None if there are no nodes.

Return type:

dict or None

MHCXGraph.utils.analysis.project_nodes_instances(frame_nodes: list[Any], p: int) list[str][source]

Project associated nodes onto the p-th protein.

Parameters:
  • frame_nodes (list) – List of associated graph nodes as tuples of residue labels.

  • p (int) – Index of the protein to project.

Returns:

Residue labels for protein p.

Return type:

list of str

MHCXGraph.utils.analysis.summarize_frame_nodes(df_fp_nodes_for_frame: pandas.DataFrame) dict[str, Any][source]

Compute weighted summaries for node coverage across proteins in a frame.

Parameters:

df_fp_nodes_for_frame (pandas.DataFrame) – Per protein node coverage for one frame.

Returns:

Summary statistics including weighted mean, median and dispersion.

Return type:

dict

MHCXGraph.utils.analysis.unique_chain_signatures(frame_nodes: list[tuple]) list[str][source]

Compute sorted unique chain signatures for all nodes in a frame.

Parameters:

frame_nodes (list of tuple) – List of associated nodes as tuples of residue labels.

Returns:

Sorted unique chain signatures.

Return type:

list of str

MHCXGraph.utils.analysis.wmean(x, w)[source]

Weighted mean.

Parameters:
  • x (array_like) – Data values.

  • w (array_like) – Weights.

Returns:

Weighted mean.

Return type:

float

MHCXGraph.utils.analysis.wmedian(x, w)[source]

Weighted median.

Parameters:
  • x (array_like) – Data values.

  • w (array_like) – Weights.

Returns:

Weighted median.

Return type:

float

MHCXGraph.utils.analysis.wstd(x, w)[source]

Weighted standard deviation.

Parameters:
  • x (array_like) – Data values.

  • w (array_like) – Weights.

Returns:

Weighted standard deviation.

Return type:

float

MHCXGraph.utils.analysis.wtrimmed_mean(x, w, trim=0.1)[source]

Weighted trimmed mean removing tails.

Parameters:
  • x (array_like) – Data values.

  • w (array_like) – Weights.

  • trim (float, optional) – Fraction to trim at each tail, by default 0.10.

Returns:

Weighted trimmed mean.

Return type:

float

Functions

chain_signature(node_tuple)

Compute a chain signature for a tuple of node labels.

evaluate_all_frames_nodes(json_path)

Evaluate node coverage for all components and frames from a JSON file.

evaluate_all_frames_nodes_weighted(json_path)

Evaluate weighted node coverage summaries for all frames.

evaluate_frame_nodes(component_id, frame_id, ...)

Evaluate node coverage metrics for one frame across all proteins.

get_protein_keys(original_graphs)

Return a list of protein keys sorted numerically if keys are numeric strings.

ivw_mean_proportions(cov, n)

Inverse variance weighted mean for proportions with shrinkage.

node_similarity_for_protein(frame, ...)

Compute node coverage metrics for a single protein in one frame.

project_nodes_instances(frame_nodes, p)

Project associated nodes onto the p-th protein.

summarize_frame_nodes(df_fp_nodes_for_frame)

Compute weighted summaries for node coverage across proteins in a frame.

unique_chain_signatures(frame_nodes)

Compute sorted unique chain signatures for all nodes in a frame.

wmean(x, w)

Weighted mean.

wmedian(x, w)

Weighted median.

wstd(x, w)

Weighted standard deviation.

wtrimmed_mean(x, w[, trim])

Weighted trimmed mean removing tails.