SOMOs documentation¶
Download the scientific documentation here:
projection-v2.pdf
- somos.io.extract_gaussian_info(logfile_path)¶
Extracts molecular orbital and structural information from a Gaussian log file using cclib.
- Parameters:
logfile_path (str) – Path to the Gaussian .log file.
- Returns:
A dictionary containing UDFT/DFT type, basis size, molecular orbitals, geometry, occupation, HOMO index, spin values, and the AO overlap matrix.
- Return type:
dict
- somos.io.load_mos_from_cclib(logfolder, filename)¶
Loads molecular orbital data from Gaussian output using cclib and organizes them into DataFrames.
- Parameters:
logfolder (str or Path) – Directory containing the log file.
filename (str) – Name of the Gaussian .log file.
- Returns:
Alpha and beta DataFrames, coefficient matrices, basis count, overlap matrix, and full info dictionary.
- Return type:
tuple
- somos.cosim.analyzeSimilarity(logfolder, logfile)¶
Full analysis pipeline to extract, match, and compare alpha and beta molecular orbitals. Displays interactive similarity widgets and saves annotated similarity results to Excel.
- Parameters:
logfolder (str or Path) – Path to the folder containing the Gaussian log file.
logfile (str) – Filename of the Gaussian log file.
- Returns:
Alpha/beta DataFrames, coefficient matrices, nbasis, SOMO DataFrame, and overlap matrix.
- Return type:
tuple
- somos.cosim.build_full_similarity_table(lMOs, cMOs, nbasis, overlap_matrix, lumo_plusAlpha=5, lumo_plusBeta=5)¶
Builds a similarity matrix between selected alpha and beta MOs and returns optimal matches.
- Parameters:
lMOs (tuple of pd.DataFrame) – Alpha and beta orbital DataFrames.
cMOs (tuple of np.ndarray) – Coefficient matrices for alpha and beta MOs.
nbasis (int) – Number of basis functions.
overlap_matrix (np.ndarray) – Overlap matrix.
lumo_plusAlpha (int) – Number of virtual alpha orbitals to include beyond LUMO.
lumo_plusBeta (int) – Number of virtual beta orbitals to include beyond LUMO.
- Returns:
DataFrame with matches, similarity matrix, and selected alpha indices.
- Return type:
tuple
- somos.cosim.cluster_orbitals(MOs, spin='alpha')¶
Performs hierarchical clustering of molecular orbitals based on cosine similarity.
- Parameters:
MOs (tuple of np.ndarray) – Tuple containing coefficient matrices for alpha and beta orbitals.
spin (str) – Spin type to cluster (‘alpha’ or ‘beta’).
- somos.cosim.cosine_similarity_with_overlap(ci, cj, S)¶
Computes the cosine similarity between two coefficient vectors using an overlap matrix.
- Parameters:
ci (np.ndarray) – Coefficient vector i.
cj (np.ndarray) – Coefficient vector j.
S (np.ndarray) – Overlap matrix.
- Returns:
Cosine similarity between ci and cj.
- Return type:
float
- somos.cosim.cross_match_all(alpha_df, beta_df, alpha_mat, beta_mat, nbasis, overlap_matrix, n_virtual_alpha=0)¶
Matches alpha and beta MOs by maximizing similarity and computes their pairwise similarity and energy difference.
- Parameters:
alpha_df (pd.DataFrame) – DataFrame for alpha orbitals.
beta_df (pd.DataFrame) – DataFrame for beta orbitals.
alpha_mat (np.ndarray) – Alpha orbital coefficients.
beta_mat (np.ndarray) – Beta orbital coefficients.
nbasis (int) – Number of basis functions.
overlap_matrix (np.ndarray) – Overlap matrix.
n_virtual_alpha (int) – Number of virtual alpha orbitals to include.
- Returns:
Table with matching alpha-beta pairs, similarity scores, and energy differences.
- Return type:
pd.DataFrame
- somos.cosim.find_somo_candidates(alpha_df, beta_df, alpha_mat, beta_mat, nbasis, overlap_matrix, spin, threshold=0.99)¶
Identifies singly occupied molecular orbital (SOMO) candidates by comparing similarities between occupied alpha and all beta orbitals.
- Parameters:
alpha_df (pd.DataFrame) – DataFrame for alpha orbitals.
beta_df (pd.DataFrame) – DataFrame for beta orbitals.
alpha_mat (np.ndarray) – Alpha orbital coefficients.
beta_mat (np.ndarray) – Beta orbital coefficients.
nbasis (int) – Number of basis functions.
overlap_matrix (np.ndarray) – Overlap matrix.
spin (dict) – spin[“S2”]: eigenvalue of the S2 operator (float) spin[“S”]: S-value (float) spin[“multiplicity”] (float, calculated after 2S+1)
threshold (float) – Maximum allowed similarity for SOMO detection.
- Returns:
Table listing SOMO-like orbital pairs and their properties.
- Return type:
pd.DataFrame
- somos.cosim.heatmap_MOs(lMOs, cMOs, nbasis, overlap_matrix, logfolder='./logs', logfilename='logfile.log')¶
Interactive cosine similarity heatmap between alpha and beta MOs around the HOMO-LUMO frontier.
- Parameters:
lMOs (tuple of pd.DataFrame) – Alpha and beta orbital DataFrames.
cMOs (tuple of np.ndarray) – Coefficient matrices for alpha and beta orbitals.
nbasis (int) – Number of basis functions.
overlap_matrix (np.ndarray) – Overlap matrix.
logfolder (str) – Directory to save the heatmap PNG.
logfilename (str) – Filename used as prefix for saving.
- somos.cosim.interactive_similarity(alpha_df, beta_df, alpha_mat, beta_mat, overlap_matrix)¶
Interactive widget to compute and display scalar product and cosine similarity between selected alpha and beta MOs using the overlap matrix.
- Parameters:
alpha_df (pd.DataFrame) – DataFrame for alpha orbitals.
beta_df (pd.DataFrame) – DataFrame for beta orbitals.
alpha_mat (np.ndarray) – Coefficient matrix for alpha orbitals.
beta_mat (np.ndarray) – Coefficient matrix for beta orbitals.
overlap_matrix (np.ndarray) – Overlap matrix.
- somos.cosim.save_similarity_per_somo_from_df(df_SOMOs, lMOs, cMOs, nbasis, overlap_matrix, logfolder, logfile)¶
Saves one Excel sheet per SOMO candidate listing similarities with all beta MOs, sorted by decreasing similarity. Best match is highlighted in yellow.
- Parameters:
df_SOMOs (pd.DataFrame) – DataFrame with identified SOMO candidates.
lMOs (tuple of pd.DataFrame) – Alpha and beta orbital DataFrames.
cMOs (tuple of np.ndarray) – Alpha and beta coefficient matrices.
nbasis (int) – Number of basis functions.
overlap_matrix (np.ndarray) – Overlap matrix.
logfolder (str or Path) – Folder containing the log file.
logfile (str) – Name of the log file.
- somos.cosim.scalar_product_with_overlap(ci, cj, S)¶
Computes the scalar product between two coefficient vectors using an overlap matrix.
- Parameters:
ci (np.ndarray) – Coefficient vector i.
cj (np.ndarray) – Coefficient vector j.
S (np.ndarray) – Overlap matrix.
- Returns:
Scalar product ci^T S cj.
- Return type:
float
- somos.cosim.tsne(lMOs, cMOs, overlap_matrix, logfolder='./logs', logfilename='logfile.log')¶
Performs a t-SNE projection of molecular orbitals (alpha and beta) using a cosine similarity metric invariant to phase, and displays an interactive Plotly visualization.
- Parameters:
lMOs (tuple of pd.DataFrame) – DataFrames for alpha and beta molecular orbitals.
cMOs (tuple of np.ndarray) – Coefficient matrices for alpha and beta orbitals.
overlap_matrix (np.ndarray) – Overlap matrix used for computing cosine similarity.
logfolder (str) – Path to the folder where the plot image will be saved.
logfilename (str) – Name of the Gaussian log file used to prefix saved plots.
- somos.proj.compute_orbital_projections(lMOs, cMOs, overlap_matrix)¶
Computes how much each alpha orbital is represented in the beta orbital space using the AO overlap matrix S.
- Parameters:
lMOs (tuple of DataFrames) – Tuple (alpha_df, beta_df), each containing orbital metadata.
cMOs (tuple of np.ndarray) – Tuple (alpha_mat, beta_mat), each of shape (n_orbs, n_basis), with rows as orbitals.
overlap_matrix (np.ndarray) – AO overlap matrix, shape (n_basis, n_basis).
- Returns:
DataFrame with alpha orbital number, energy, occupation, and squared projection norm.
- Return type:
pd.DataFrame
- somos.proj.compute_projection_matrix_and_eigenvalues(lMOs, cMOs, nbasis, overlap_matrix)¶
Computes the projection matrix P = A Aᵀ where A = alpha_occ · S · beta.T, and returns its eigenvalues and eigenvectors.
- Parameters:
lMOs (tuple) – Tuple containing two DataFrames: (alpha_df, beta_df), each with MO occupations and energies.
cMOs (tuple) – Tuple of two np.ndarrays: (alpha_mat, beta_mat), each of shape (n_OMs, n_basis). MOs are stored in rows (i.e., row i = MO_i).
nbasis (int) – Number of basis functions.
overlap_matrix (np.ndarray) – AO overlap matrix (shape: n_basis, n_basis).
- Returns:
eigenvalues (np.ndarray) – Eigenvalues of the projection matrix P.
eigenvectors (np.ndarray) – Eigenvectors of the projection matrix P.
P (np.ndarray) – The projection matrix P = A Aᵀ.
- somos.proj.identify_somos_from_projection(logfolder, logfile)¶
Identifies potential SOMOs by projecting occupied alpha orbitals onto the beta orbital space.
- Parameters:
logfolder (str) – Path to the folder containing the Gaussian log file.
logfile (str) – Name of the Gaussian .log file.
function (This)
file. (- Loads orbital data from the log)
βᵀ. (- Computes the projection matrix P = A Aᵀ where A = α_occ · S ·)
eigenvalues. (- Diagonalizes P and plots its)
orbitals. (- Flags alpha orbitals with eigenvalues > 0.5 that project mainly onto virtual beta)
- somos.proj.parse_beta_contrib_string(s)¶
- somos.proj.print_eigen_analysis(eigenvalues, threshold=0.8)¶
Prints and analyzes the eigenvalues of the projection matrix.
- Parameters:
eigenvalues (np.ndarray) – Eigenvalues of the projection matrix (real and ≥ 0).
threshold (float) – Eigenvalues below this threshold are considered “low” (possible SOMO signature).
- somos.proj.project_occupied_alpha_onto_beta(logfolder, logfile, threshold_beta=20)¶
Projects each occupied alpha orbital onto the full set of beta orbitals (occupied + virtual) using the AO overlap matrix. Returns a summary DataFrame including projection norms, dominant beta contributions, and diagnostic flags.
- Parameters:
logfolder (str) – Path to the folder containing the Gaussian log file.
logfile (str) – Name of the Gaussian log file.
threshold_beta (float, optional) – Percentage threshold (default: 20) above which a beta orbital is considered significant in the projection.
- Returns:
DataFrame with one row per occupied alpha orbital and the following columns: - ‘Alpha OM’: Index (1-based) of the alpha orbital - ‘Occ α’: Occupation of the alpha orbital (usually ‘O’) - ‘Energy (Ha)’: Energy of the alpha orbital - ‘Projection² on β_virtual’: Squared norm of the projection onto the virtual beta space - ‘Projection² on β_occupied’: Squared norm of the projection onto the occupied beta space - ‘Dominant β MO’: Index (1-based) of the beta orbital with the largest projection - ‘Index4Jmol’: Jmol-compatible index for the dominant beta orbital - ‘Occ β’: Occupation of the dominant beta orbital (‘V’ or ‘O’) - ‘E (β, Ha)’: Energy of the dominant beta orbital - ‘Top 1 contrib (%)’: Percentage of the total projection norm carried by the most contributing beta orbital - ‘Top 2 contrib (%)’: Cumulative contribution of the top 2 beta orbitals - ‘Top 3 contrib (%)’: Cumulative contribution of the top 3 beta orbitals - ‘Dominance ratio’: Largest single contribution / total projection - ‘Spread?’: Flag indicating whether the projection is distributed (“Yes” if <60% dominance) - ‘β orbitals >{threshold_beta}%’: List of tuples [OM index (1-based), contribution (%)] for beta orbitals contributing >{threshold_beta value}% - ‘SOMO?’: Yes if projection is dominant onto virtual space and small on occupied
- Return type:
pd.DataFrame
Notes
The squared projection of an occupied alpha orbital ( phi^lpha_i ) onto the full beta space is computed as:
[ mathbf{v}_i = phi^lpha_i cdot S cdot (phi^eta)^T ]
where ( S ) is the AO overlap matrix, and ( phi^eta ) is the matrix of beta MOs. The squared norm ( |mathbf{v}_i|^2 ) represents the total overlap.
Top-N contributions are computed by squaring the individual projections ( v_{ij} ), sorting them, and evaluating the cumulative contributions from the top 1, 2, or 3 beta orbitals. These are returned as “Top 1 contrib (%)”, “Top 2 contrib (%)”, and “Top 3 contrib (%)”.
The column “β orbitals >{threshold_beta}%” lists all beta orbitals contributing more than the specified percentage to the squared projection norm, with both their index (1-based) and contribution in percent.
The flag “SOMO?” is set to “Yes” if the squared projection on the virtual beta subspace is greater than 0.5, and the projection on the occupied beta subspace is below 0.5.
The total number of beta orbitals ( N ) used in the projection is equal to the total number of molecular orbitals in the beta spin channel. The projection is performed over the complete beta space, regardless of occupation.
- somos.proj.projection_heatmap_from_df(df, nbasis, logfolder='./logs', logfile='logfile.log')¶
- somos.proj.show_alpha_to_homo(df_proj, logfolder, logfile, highlight_somo=True)¶
Affiche les lignes du DataFrame df_proj correspondant aux orbitales alpha allant de l’α 1 jusqu’à la HOMO, avec surlignage facultatif des SOMOs.
Paramètres¶
- df_projpd.DataFrame
DataFrame contenant les résultats de projection alpha → beta.
- logfolderstr
Dossier contenant le fichier log.
- logfilestr
Nom du fichier log.
- highlight_somobool
Si True, surligne les lignes avec SOMO? == “Yes”.
Retourne¶
- pd.DataFrame ou Styler
Un sous-ensemble stylisé ou brut du DataFrame.