SOMOs documentation

gSOMOs banner
somos.io.clean_logfile_name(logfile)

Given a log file name (e.g., ‘myfile.log’ or ‘myfile.log.gz’), returns the base calculation name without any ‘.log’ or ‘.log.gz’ extension.

Parameters:

logfile (str or Path) – The name of the Gaussian log file.

Returns:

The cleaned calculation name, without the ‘.log’ or ‘.log.gz’ suffix.

Return type:

str

somos.io.extract_gaussian_info(logfile_path)

Extracts molecular orbital and structural information from a Gaussian log file using cclib.

Parameters:

logfile_path (str) – Path to the Gaussian .log file.

Returns:

A dictionary containing UDFT/DFT type, basis size, molecular orbitals, geometry, occupation, HOMO index, spin values, and the AO overlap matrix.

Return type:

dict

somos.io.load_mos_from_cclib(logfolder, filename)

Loads molecular orbital data from Gaussian output using cclib and organizes them into DataFrames.

Parameters:
  • logfolder (str or Path) – Directory containing the log file.

  • filename (str) – Name of the Gaussian .log file.

Returns:

Alpha and beta DataFrames, coefficient matrices, basis count, overlap matrix, and full info dictionary.

Return type:

tuple

somos.cosim.analyzeSimilarity(logfolder, logfile)

Full analysis pipeline to extract, match, and compare alpha and beta molecular orbitals. Displays interactive similarity widgets and saves annotated similarity results to Excel.

Parameters:
  • logfolder (str or Path) – Path to the folder containing the Gaussian log file.

  • logfile (str) – Filename of the Gaussian log file.

Returns:

Alpha/beta DataFrames, coefficient matrices, nbasis, SOMO DataFrame, and overlap matrix.

Return type:

tuple

somos.cosim.build_full_similarity_table(lMOs, cMOs, nbasis, overlap_matrix, lumo_plusAlpha=5, lumo_plusBeta=5)

Builds a similarity matrix between selected alpha and beta MOs and returns optimal matches.

Parameters:
  • lMOs (tuple of pd.DataFrame) – Alpha and beta orbital DataFrames.

  • cMOs (tuple of np.ndarray) – Coefficient matrices for alpha and beta MOs.

  • nbasis (int) – Number of basis functions.

  • overlap_matrix (np.ndarray) – Overlap matrix.

  • lumo_plusAlpha (int) – Number of virtual alpha orbitals to include beyond LUMO.

  • lumo_plusBeta (int) – Number of virtual beta orbitals to include beyond LUMO.

Returns:

DataFrame with matches, similarity matrix, and selected alpha indices.

Return type:

tuple

somos.cosim.cluster_orbitals(MOs, spin='alpha')

Performs hierarchical clustering of molecular orbitals based on cosine similarity.

Parameters:
  • MOs (tuple of np.ndarray) – Tuple containing coefficient matrices for alpha and beta orbitals.

  • spin (str) – Spin type to cluster (‘alpha’ or ‘beta’).

somos.cosim.cosine_similarity_with_overlap(ci, cj, S)

Computes the cosine similarity between two coefficient vectors using an overlap matrix.

Parameters:
  • ci (np.ndarray) – Coefficient vector i.

  • cj (np.ndarray) – Coefficient vector j.

  • S (np.ndarray) – Overlap matrix.

Returns:

Cosine similarity between ci and cj.

Return type:

float

somos.cosim.cross_match_all(alpha_df, beta_df, alpha_mat, beta_mat, nbasis, overlap_matrix, n_virtual_alpha=0)

Matches alpha and beta MOs by maximizing similarity and computes their pairwise similarity and energy difference.

Parameters:
  • alpha_df (pd.DataFrame) – DataFrame for alpha orbitals.

  • beta_df (pd.DataFrame) – DataFrame for beta orbitals.

  • alpha_mat (np.ndarray) – Alpha orbital coefficients.

  • beta_mat (np.ndarray) – Beta orbital coefficients.

  • nbasis (int) – Number of basis functions.

  • overlap_matrix (np.ndarray) – Overlap matrix.

  • n_virtual_alpha (int) – Number of virtual alpha orbitals to include.

Returns:

Table with matching alpha-beta pairs, similarity scores, and energy differences.

Return type:

pd.DataFrame

somos.cosim.find_somo_candidates(alpha_df, beta_df, alpha_mat, beta_mat, nbasis, overlap_matrix, spin, threshold=0.99)

Identifies singly occupied molecular orbital (SOMO) candidates by comparing similarities between occupied alpha and all beta orbitals.

Parameters:
  • alpha_df (pd.DataFrame) – DataFrame for alpha orbitals.

  • beta_df (pd.DataFrame) – DataFrame for beta orbitals.

  • alpha_mat (np.ndarray) – Alpha orbital coefficients.

  • beta_mat (np.ndarray) – Beta orbital coefficients.

  • nbasis (int) – Number of basis functions.

  • overlap_matrix (np.ndarray) – Overlap matrix.

  • spin (dict) – spin[“S2”]: eigenvalue of the S2 operator (float) spin[“S”]: S-value (float) spin[“multiplicity”] (float, calculated after 2S+1)

  • threshold (float) – Maximum allowed similarity for SOMO detection.

Returns:

Table listing SOMO-like orbital pairs and their properties.

Return type:

pd.DataFrame

somos.cosim.heatmap_MOs(lMOs, cMOs, nbasis, overlap_matrix, logfolder='./logs', logfilename='logfile.log')

Interactive cosine similarity heatmap between alpha and beta MOs around the HOMO-LUMO frontier.

Parameters:
  • lMOs (tuple of pd.DataFrame) – Alpha and beta orbital DataFrames.

  • cMOs (tuple of np.ndarray) – Coefficient matrices for alpha and beta orbitals.

  • nbasis (int) – Number of basis functions.

  • overlap_matrix (np.ndarray) – Overlap matrix.

  • logfolder (str) – Directory to save the heatmap PNG.

  • logfilename (str) – Filename used as prefix for saving.

somos.cosim.interactive_similarity(alpha_df, beta_df, alpha_mat, beta_mat, overlap_matrix)

Interactive widget to compute and display scalar product and cosine similarity between selected alpha and beta MOs using the overlap matrix.

Parameters:
  • alpha_df (pd.DataFrame) – DataFrame for alpha orbitals.

  • beta_df (pd.DataFrame) – DataFrame for beta orbitals.

  • alpha_mat (np.ndarray) – Coefficient matrix for alpha orbitals.

  • beta_mat (np.ndarray) – Coefficient matrix for beta orbitals.

  • overlap_matrix (np.ndarray) – Overlap matrix.

somos.cosim.save_similarity_per_somo_from_df(df_SOMOs, lMOs, cMOs, nbasis, overlap_matrix, logfolder, logfile)

Saves one Excel sheet per SOMO candidate listing similarities with all beta MOs, sorted by decreasing similarity. Best match is highlighted in yellow.

Parameters:
  • df_SOMOs (pd.DataFrame) – DataFrame with identified SOMO candidates.

  • lMOs (tuple of pd.DataFrame) – Alpha and beta orbital DataFrames.

  • cMOs (tuple of np.ndarray) – Alpha and beta coefficient matrices.

  • nbasis (int) – Number of basis functions.

  • overlap_matrix (np.ndarray) – Overlap matrix.

  • logfolder (str or Path) – Folder containing the log file.

  • logfile (str) – Name of the log file.

somos.cosim.scalar_product_with_overlap(ci, cj, S)

Computes the scalar product between two coefficient vectors using an overlap matrix.

Parameters:
  • ci (np.ndarray) – Coefficient vector i.

  • cj (np.ndarray) – Coefficient vector j.

  • S (np.ndarray) – Overlap matrix.

Returns:

Scalar product ci^T S cj.

Return type:

float

somos.cosim.tsne(lMOs, cMOs, overlap_matrix, logfolder='./logs', logfilename='logfile.log')

Performs a t-SNE projection of molecular orbitals (alpha and beta) using a cosine similarity metric invariant to phase, and displays an interactive Plotly visualization.

Parameters:
  • lMOs (tuple of pd.DataFrame) – DataFrames for alpha and beta molecular orbitals.

  • cMOs (tuple of np.ndarray) – Coefficient matrices for alpha and beta orbitals.

  • overlap_matrix (np.ndarray) – Overlap matrix used for computing cosine similarity.

  • logfolder (str) – Path to the folder where the plot image will be saved.

  • logfilename (str) – Name of the Gaussian log file used to prefix saved plots.

somos.proj.diagonalize_alpha_occ_to_beta_occ_and_virt_separately(logfolder, logfile, threshold=0.15)

Projects occupied alpha orbitals separately onto beta occupied and beta virtual subspaces, diagonalizes the two projection matrices, and analyzes dominant contributions.

Parameters:
  • logfolder (str) – Folder containing the Gaussian log file.

  • logfile (str) – Name of the Gaussian log file.

  • threshold (float) – Minimum squared coefficient to consider a beta orbital as dominant (default: 0.15).

somos.proj.parse_beta_contrib_string(s)
somos.proj.project_occupied_alpha_onto_beta(logfolder, logfile, threshold_beta=15)

Projects each occupied alpha orbital onto the full set of beta orbitals (occupied + virtual) using the AO overlap matrix. Returns a summary DataFrame including projection norms, dominant beta contributions, and diagnostic flags.

Parameters:
  • logfolder (str) – Path to the folder containing the Gaussian log file.

  • logfile (str) – Name of the Gaussian log file.

  • threshold_beta (float, optional) – Percentage threshold (default: 15%) above which a beta orbital is considered significant in the projection.

Returns:

DataFrame with one row per occupied alpha orbital and the following columns: - ‘Alpha OM’: Index (1-based) of the alpha orbital - ‘Occ α’: Occupation of the alpha orbital (usually ‘O’) - ‘Energy (Ha)’: Energy of the alpha orbital - ‘P² on β_virt’: Squared norm of the projection onto the virtual beta space - ‘P² on β_occ’: Squared norm of the projection onto the occupied beta space - ‘Dominant β MO’: Index (1-based) of the beta orbital with the largest projection - ‘Index4Jmol’: Jmol-compatible index for the dominant beta orbital - ‘Occ β’: Occupation of the dominant beta orbital (‘V’ or ‘O’) - ‘E (β, Ha)’: Energy of the dominant beta orbital - ‘Top 1 (%)’: Percentage of the total projection norm carried by the most contributing beta orbital - ‘Top 2 (%)’: Cumulative contribution of the top 2 beta orbitals - ‘Top 3 (%)’: Cumulative contribution of the top 3 beta orbitals - ‘Spread?’: Flag indicating whether the projection is distributed (“Yes” if <60% dominance) - ‘β orbitals >{threshold_beta}%’: List of tuples [OM index (1-based), contribution (%)] for beta orbitals contributing >{threshold_beta value}% - ‘SOMO P2v?’: Yes for occupied alpha MOs, if ‘P² on β_virt’ is dominant onto virtual space and small on occupied - ‘SOMO dom. β MO?’: Yes for occupied alpha MOs, if the dominant MO is a virtual β MO

Return type:

pd.DataFrame

Notes

The squared projection of an occupied alpha orbital \(\phi^\alpha_i\) onto the full beta space is computed as:

\[\]

mathbf{v}_i = phi^alpha_i cdot S cdot (phi^beta)^T

where \(S\) is the AO overlap matrix, and \(\phi^\beta\) is the matrix of beta MOs. The squared norm \(\|\mathbf{v}_i\|^2\) represents the total overlap.

Top-N contributions are computed by squaring the individual projections \(v_{ij}\), sorting them, and evaluating the cumulative contributions from the top 1, 2, or 3 beta orbitals. These are returned as “Top 1 (%)”, “Top 2 (%)”, and “Top 3 (%)”.

The column “β orbitals >{threshold_beta}%” lists all beta orbitals contributing more than the specified percentage to the squared projection norm, with both their index (1-based) and contribution in percent.

The flag “SOMO P2v?” is set to “Yes” for occupied alpha MOs if the squared projection on the virtual beta subspace is >= 0.5, and the projection on the occupied beta subspace is strictly below 0.5.

The flag “SOMO dom. β MO?” is a weaker criterion. It is set to “Yes” for occupied alpha MOs if the dominant MO is a virtual beta MO

The total number of beta orbitals \(N\) used in the projection is equal to the total number of molecular orbitals in the beta spin channel. The projection is performed over the complete beta space, regardless of occupation.

somos.proj.projection_heatmap_from_df(df, nbasis, logfolder='./logs', logfile='logfile.log')
somos.proj.show_alpha_to_homo(df_proj, logfolder, logfile, highlight_somo=True)

Affiche les lignes du DataFrame df_proj correspondant aux orbitales alpha allant de l’α 1 jusqu’à la HOMO, avec surlignage facultatif des SOMOs.

Paramètres

df_projpd.DataFrame

DataFrame contenant les résultats de projection alpha → beta.

logfolderstr

Dossier contenant le fichier log.

logfilestr

Nom du fichier log.

highlight_somobool
Si True, surligne en jaune les lignes avec SOMO P2v? == “Yes”

surligne en orange les lignes avec SOMO P2v? == “No”, mais SOMO dom. β MO? == “Yes”

Retourne

pd.DataFrame ou Styler

Un sous-ensemble stylisé ou brut du DataFrame.