LLM for alle - Introduksjonskurs til språkmodeller med Python og Azure OpenAI¶

Tanker¶

  • Gi mer info i prompt om situasjonen rundt spørreundersøkelsen og hva det har blitt spurt om. Spesielt i situasjonen hvor vi øsnker å oppsummere den generelle viben av feedbacken i hver kategori. Var folk fornøyde? Ønsker de tiltak for forbedring?

Oppgave¶

Du jobber i et IT-selskap og har fått i oppgave å analysere svarene fra en intern medarbeiderundersøkelse. Undersøkelsen er anonym, og du har fått tilsendt en CSV-fil med 50 tilbakemeldinger – én per ansatt. Målet er å finne ut hva folk er fornøyde eller misfornøyde med, og særlig se nærmere på temaene Nettverk, Opplæring og IT-support, som ledelsen er ekstra interessert i. Tilbakemeldinger som ikke passer i disse kategoriene skal også få sin plass. Til slutt skal du lage en oppsummering som kan sendes til ledelsen.

For å jobbe effektivt bruker du en språkmodell til å hjelpe deg med både kategorisering og oppsummering.

Oppgaven blir dermed å bruke en språkmodell til å kategorisere samt oppsummere tilbakemeldingene fra undersøkelsen.

Datasett¶

In [1]:
# Importerer rådataen med tilbakemeldinger, en rad per ansatt. Alle ansatte har svart på undersøkelsen. 

import pandas as pd
pd.set_option('display.max_colwidth', None) # Ensure no truncated output of dataframe

enr_path = "../files/IT_survey.csv"
df = pd.read_csv(enr_path)
df.head()
Out[1]:
ID Category Feedback
0 1 Training The recent training sessions on new software updates provided clear guidance, though sometimes their rapid pace left me wishing for more practical examples.
1 2 Training The interactive training modules are well-designed but occasionally overwhelm with too many complex details, balancing excitement with mild frustration.
2 3 Training I appreciate how the sessions cover both the basics and advanced features, yet the limited time for Q&A sometimes leaves lingering doubts.
3 4 Training The hands-on exercises are engaging and boost confidence, though I occasionally struggle to keep up with the fast pace.
4 5 Training The training materials are succinct and creatively presented, but rapid changes in content sometimes create a disconnect.

Bruk av språkmodeller gjennom en API (Using a language model through the API)¶

En vanlig LLM-spørring¶

Enkle LLM-spørringer er bygget opp av noen sentrale deler:

  1. Tilkobling til en API, som feks. Azure OpenAI

  2. En prompt, som vil si en tekstbasert forespørsel/instruks

  3. Sending av prompt til språkmodellen for å hente en respons

Som en del av tilkoblingen er det vanlig å oppgi en temperaturparameter. Denne parametreren angir nivået av presisjon du ønsker å få i responsen fra språkmodellen, og kan enten måles på en skala fra "low" til "high" eller numerisk fra 0 til 1. Hvis denne parameteren settes til "low"/nærme 0 tillater du liten grad av variasjon og kreativitet i responsen, og du vil få tryggere og mer forutsigbare svar. Hvis den derimot settes til høy/nærme 1 tillater du større grad av kreativitet og detaljer, men vil følgelig også få en mer uforutsigbar respons.

/// TODO: Intro til språkmodeller

  • Slik kaller man en språkmodell
  • Hva er temperaturparameteren
In [29]:
# Connect through the API
from langchain_openai import AzureChatOpenAI
from dotenv import find_dotenv, load_dotenv
import os


# Get environment variables
load_dotenv(find_dotenv(), override=True)


llm = AzureChatOpenAI(
    azure_deployment="gpt-4o-mini",
    model=os.environ.get("OPENAI_MODEL_GPT_4O-MINI", default="gpt-4o-mini"),
    temperature=0,
)

reasoning_llm = AzureChatOpenAI(
    azure_deployment="o3-mini",
    model="o3-mini",
    reasoning_effort="medium",
)
/usr/lib/python3/dist-packages/IPython/core/interactiveshell.py:3377: UserWarning: WARNING! reasoning_effort is not default parameter.
                reasoning_effort was transferred to model_kwargs.
                Please confirm that reasoning_effort is what you intended.
  if (await self.run_code(code, result,  async_=asy)):
In [4]:
# Generate the prompt
prompt = 'Hei!'

# Send the prompt and recieve a response
response = llm.invoke(prompt)

# Show the response from the model
response
Out[4]:
AIMessage(content='Hei! Hvordan kan jeg hjelpe deg i dag?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 9, 'total_tokens': 20, 'completion_tokens_details': {'audio_tokens': 0, 'reasoning_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini', 'system_fingerprint': 'fp_7a53abb7a2', 'prompt_filter_results': [{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'jailbreak': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}], 'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'protected_material_code': {'filtered': False, 'detected': False}, 'protected_material_text': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}, id='run-72f284f0-a931-450d-97b6-a4163894492f-0', usage_metadata={'input_tokens': 9, 'output_tokens': 11, 'total_tokens': 20, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})
In [5]:
# Show only the content of the response
response.content
Out[5]:
'Hei! Hvordan kan jeg hjelpe deg i dag?'

Hvordan lage en enkel kjede med LangChain Expression Language (LCEL) // Creating a basic chain with Langchain Expression Language (LCEL)¶

LCEL er en metode for å bygge og kjøre såkalte kjeder i LangChain på. Kjeder, eller chains, brukes for å koble sammen ulike AI-komponenter. F.eks. kan språkmodeller, datakilder og logikk kobles sammen til én sammenhengende prosess, dvs. en kjede. LCEL gir et standardisert språk for å definere disse kjedene, og er brukervennlig fordi man slipper å lage alt manuelt med kode. Med andre ord får du en "oppskrift" på hvordan AI-komponentene dine skal jobbe sammen på en rask og skalerbar måte.

Fordelene med LCEL

  1. Støtter parallell og asynkron kjøring - Ulike deler av kjeden kan kjøre samtidig, og systemet kan behandle flere forespørsler på en gang. Dermed kan oppgaver behandles raskere.
  2. Strømming av resultater - Man kan begynne å se svar mens AI-en fremdeles jobber. (Passer spesielt godt for chatbaserte løsninger)
  3. Enkel feilsøking med LangSmith - Når kjedene blir komplekse er det viktig å kunne se hva som har blitt gjort underveis. LCEL logger automatisk alt til LangSmith, som gjør det enklere å feilsøke.
  4. Standardisert - Alle kjeder i LCEL bruker samme grensesnitt, som gjør dem enkle å kombinere og gjenbruke på tvers av prosjekter.

LCEL bruker en pipe-operator (|) til å koble sammen ulike trinn i kjeden. Den tar ut data fra én komponent og sender den direkte som input til neste komponent. LCEL bruker også PromptTemplate, som kan tenkes på som en mal for teksten du sender til språkmodellen. PromptTemplate gjør det enkelt å lage dynamiske meldinger ved at man kan sette inn variabler i teksten, litt som en oppskrift der du fyller inn det som mangler før det sendes til AI-modellen. Fordelen med PromptTemplate er at man kan lage én mal, og bruke den med ulike data. Det hjelper deg også med å skille selve teksten fra logikken, og kan gjøre prosessen sikrere ved at man unngår feil som kan oppstå ved manuell string-manipulasjon. Vi skal nå se på noen eksempler med LCEL som bruker pipe-operator og PromptTemplate.

Kilde: xxx

// TODO:

  • Fordelene med LCEL
  • Forklare pipe-operatoren
  • Forklare PromptTemplate
In [11]:
from langchain_core.prompts import PromptTemplate
prompt = PromptTemplate.from_template(
    """
    Hi! Please talk like a {role}.
    """
)
prompt
Out[11]:
PromptTemplate(input_variables=['role'], input_types={}, partial_variables={}, template='\n    Hi! Please talk like a {role}.\n    ')
In [12]:
chain = prompt | llm

chain.invoke({"role": "pirate"})
Out[12]:
AIMessage(content="Ahoy, matey! What be ye wantin' to chat about on this fine day upon the high seas? Arrr!", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 28, 'prompt_tokens': 18, 'total_tokens': 46, 'completion_tokens_details': {'audio_tokens': 0, 'reasoning_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini', 'system_fingerprint': 'fp_7a53abb7a2', 'prompt_filter_results': [{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'jailbreak': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}], 'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}, id='run-781d52df-6bd3-4ce1-892f-41bf5507aa5a-0', usage_metadata={'input_tokens': 18, 'output_tokens': 28, 'total_tokens': 46, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})
In [13]:
from langchain_core.output_parsers import StrOutputParser
chain2 = chain | StrOutputParser()

chain2.invoke({"role": "pirate"})
Out[13]:
"Ahoy, matey! What be ye wantin' to chat about on this fine day? Be it treasure maps, sea shanties, or tales of the high seas? Speak up, and let’s set sail on a grand adventure! Arrr! 🏴\u200d☠️"

Batching and streaming¶

In [14]:
chain2.batch(
    [
        {"role": "pirate"},
        {"role": "cowboy"},
        {"role": "ninja"},
    ]
)
Out[14]:
["Ahoy, matey! What be ye wantin' to chat about on this fine day upon the high seas? Arrr!",
 "Well howdy there, partner! What brings ya 'round these parts? If yer lookin' for a good ol' yarn or some advice on ridin' the range, I’m all ears. Just remember, life’s a wild ride, so keep yer hat on tight and yer boots polished! What’s on yer mind, friend?",
 'Greetings, silent shadow. The night whispers secrets, and the wind carries the tales of the unseen. What knowledge do you seek, traveler of the hidden path? Stealth and wisdom await. Speak, and I shall share the way of the ninja. 🥷✨']
In [15]:
from time import sleep
for chunk in chain2.stream({"role": "pirate"}):
    print(chunk, end="")
    sleep(0.2)
Ahoy, matey! What be ye wantin' to chat about on this fine day upon the high seas? Arrr!

Enkel kategorisering av hver enkelt tilbakemelding // Simple categorization of each survey reply¶

Bruk LLMen til kategorisering av tilbakemeldingene¶

I første omgang av kategoriseringen er vi interesserte i å se hvor mange av tilbakemeldingene som passer innenfor de kategoriene ledelsen ønsket et ekstra fokus på, nemlig Network, Training og IT-support. Vi kan be LLMen om å utføre denne kategoriseringen ved å gi den tilgang på feedback-dataen. Husk på de sentrale delene av enkle LLM-kall, og benytt deg av LCEL som vist i forrige eksempel.

In [16]:
categorize_prompt = PromptTemplate.from_template(
"""
Categorize the following feedback into one of the following categories:
- Network
- Training
- IT-support
- Other

Feedback:
<feedback>
{feedback}
</feedback>
"""
)

categorize_chain = categorize_prompt | llm | StrOutputParser()

categorize_chain.invoke({"feedback": "I am very happy with the IT support I received last week."})
Out[16]:
'Category: IT-support'

Structured output¶

TODO

  • Hvorfor trenger man strukturert output
  • Forklare hva Pydantic er
  • FOrklare hvordan Pydantic brukes for å få strukturert output

Pydantic¶

Pydantic er et Python-bibliotek for datavalidering og datastrukturering. Hovedklassen i Pydantic heter BaseModel og er klassen vi arver fra når vi lager våre egne datamodeller. Når vi arver fra BaseModel får vi automatisk funksjonalitet som kan:

  1. Validere innhold du sender inn
    • Eks: Du definerer en liste med godkjente land, "Norge, Sverige, Finland", da vil ikke pydantic godkjenne "Australia".
  2. Konvertere data til riktig type
    • Eks: Du definerer at output skal være en int og sender inn '1', pydantic vil da returnere 1 (som int)
  3. Påtvinge JSON-formatering
    • Sørger for at responsen LLMen gir matcher JSOM-skjemaet til Pydantic. Dette gjør at vi kan være sikre på strukturen til outputen, som for eksempel er svært nyttig om vi ønsker å bruke outputen fra en modell som input i en annen.

Structured output¶

I denne konteksten refererer structured output til strategien og verktøyene vi bruker for å forsikre oss om at dataen vår blir organisert på en måte vi definerer på forhånd. Funksjonalitet i Pydantic biblioteket lar oss bestemme strukturen på outputen gjennom å spesifisere felter, definere typer (eks. int, str, List[int]) og validere data. I dette kurset kommer vi til å bruke BaseModel til å lage våre egne klasser for å sikre at det vi mottar fra modellene når vi prompter de komme rpå akkurat den formen vi ønsker.

In [19]:
from pydantic import BaseModel, Field

class Categorize(BaseModel):
    "Categorization of a single feedback entry from an IT survey."
    category : str = Field(description="The best fitting category. Only one.")

I eksempelet over har vi laget vår egen klasse 'Categorize' som arver av BaseModel. For variabelen category har vi brukt type hinting ( x: type ) for å definere typen Pydantic skal forvente at category er lik. Dette for ekesempel blir da ulovlig

In [18]:
Categorize(category=1)
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
/tmp/ipykernel_3761/2542644843.py in <module>
----> 1 Categorize(category=1)

~/.local/lib/python3.10/site-packages/pydantic/main.py in __init__(self, **data)
    210         # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
    211         __tracebackhide__ = True
--> 212         validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
    213         if self is not validated_self:
    214             warnings.warn(

ValidationError: 1 validation error for Categorize
category
  Input should be a valid string [type=string_type, input_value=1, input_type=int]
    For further information visit https://errors.pydantic.dev/2.9/v/string_type

Mens dette er helt ok

In [20]:
Categorize(category="Network")
Out[20]:
Categorize(category='Network')

Vi bruker også funksjonen Field, denne kan du bruke til å sette standardverdier, valideringsregler og beskrivelser. I dette kurset bruker vi kun beskrivelse, men for de spesielt interesserte kan dere lese mer om funksjonaliteten her.

For å få LLMen til å skjønne at den må følge reglene vi har definert i klassen vår bruker vi wrapperen with_structured_output(...). Denne wrapper kallet vårt til språkmodellen med logikk som forsikrer at outputen følger strukturen vi har definert i klassen.

In [21]:
categorize_chain_structured_output = categorize_prompt | llm.with_structured_output(
    Categorize,
    method="json_schema", # Påtvinger JSON-skjema for output
    strict=True           # Modellen må følge skjema etter punkt og prikke, ingen ekstra felter, ingen manglende felter og alle typer må være en eksakt match.
)

categorize_chain_structured_output.invoke(
    {"feedback": "I am very happy with the IT support I received last week."}
)
Out[21]:
Categorize(category='IT-support')

Vi kan bruke Literal for å definere våre egne typer til å bruke for type hinting. Vi kan type hinte variabler med CATEGORIES under og dette vil da modellen toke på samme måte som at en int bare har lov til å være et heltall har denne "typen" bare lov til å være en av verdiene listet opp i Literal-objektet.

In [22]:
from typing import Literal

CATEGORIES = Literal[
    "Network",
    "Training",
    "IT-support",
    'Other'
]

La oss sette sammen det vi har lært. Legg merke til at vi ikke lenger lister kategoriene i prompten (som på ingen måte garanterer at vi kun får ut en av kategorierne vi øsnker), men definerer de som et krav til strukturen på outputen fra modellen.

In [23]:
from typing import Literal

categorize_prompt2 = PromptTemplate.from_template(
"""
Categorize the following feedback into the provided categories.

Feedback:
<feedback>
{feedback}
</feedback>
"""
)



class CategorizeFromOptions(BaseModel):
    "Categorization of a single feedback entry from an IT survey."
    category: CATEGORIES = Field(
        description="Chosen category for the feedback. Choose 'Other' if the other categories provided are not a good fit."  ## noqa: E501
    )


categorize_chain_structured_output2 = categorize_prompt2 | llm.with_structured_output(
    CategorizeFromOptions,
    method="json_schema",
    strict=True,
)
In [24]:
result = categorize_chain_structured_output2.invoke(
    {"feedback": "I am very happy with the IT support I received last week."}
)

result
Out[24]:
CategorizeFromOptions(category='IT-support')

Evaluate performance across whole dataset¶

In [25]:
def categorize_single_feedback(feedback: str) -> str:
    result = categorize_chain_structured_output2.invoke(
        {"feedback": feedback}
    )
    return result.category

df["AI Classification"] = df["Feedback"].apply(
    lambda feedback: categorize_single_feedback(feedback)
)
In [ ]:
df
Out[ ]:
ID Category Feedback AI Classification
0 1 Training The recent training sessions on new software u... Training
1 2 Training The interactive training modules are well-desi... Training
2 3 Training I appreciate how the sessions cover both the b... Training
3 4 Training The hands-on exercises are engaging and boost ... Training
4 5 Training The training materials are succinct and creati... Training
5 6 Training Live workshops are full of energy and support,... Training
6 7 Training Although the sessions are structured to be int... IT-support
7 8 Training I find the training generally beneficial, yet ... Training
8 9 Training The curriculum is robust and adapts to emergin... Training
9 10 Training While the training sessions aim to cover a wid... Training
10 11 Network The office network's reliability is commendabl... Network
11 12 Network I appreciate the steady connectivity, although... Network
12 13 Network Network performance is generally satisfactory,... Network
13 14 Network The network infrastructure is robust and usual... Network
14 15 Network Connectivity is strong and dependable; however... Network
15 16 Network The network supports daily operations well, ye... Network
16 17 Network While the network operates effectively most ti... Network
17 18 IT-support The IT-support team is consistently prompt and... IT-support
18 19 IT-support I value IT-support’s clear follow-ups, yet del... IT-support
19 20 IT-support The assistance from IT-support is generally ef... IT-support
20 21 IT-support IT-support often exceeds expectations with tim... IT-support
21 22 IT-support Customer service from IT-support is engaging a... IT-support
22 23 IT-support The comprehensive responses from IT-support ar... IT-support
23 24 IT-support I appreciate IT-support’s readiness to tackle ... IT-support
24 25 IT-support While IT-support usually offers strong resolut... IT-support
25 26 Security The new cybersecurity measures give me confide... Other
26 27 Security I feel more secure with the recent network def... Network
27 28 Security The stringent security protocols inspire trust... Other
28 29 Security Enhanced monitoring tools fortify our security... Other
29 30 Security Timely security updates fortify our systems, e... Other
30 31 Security I appreciate the proactive approach to securit... Other
31 32 Security The security team usually implements robust me... Other
32 33 Security Our enhanced firewall settings significantly l... Other
33 34 Security Robust security protocols foster a safe system... Other
34 35 Security I value our firm stance on cybersecurity, even... Other
35 36 Business needs The IT solutions seem aligned with our evolvin... Other
36 37 Business needs Our systems generally support business functio... Other
37 38 Business needs The integration of technology with our busines... Other
38 39 Business needs I appreciate the efforts to tailor IT solution... IT-support
39 40 Business needs Our current infrastructure aligns with busines... Other
40 41 Business needs The balance between technical capability and b... Other
41 42 Business needs I recognize the efforts to merge IT and busine... Other
42 43 Quality of tools The quality of our primary tools is commendabl... Other
43 44 Quality of tools I’m impressed by the robustness of our softwar... IT-support
44 45 Quality of tools The suite of tools is modern and user-friendly... Other
45 46 Quality of tools Our tools maintain high quality with intuitive... Other
46 47 Quality of tools The digital tools consistently support day-to-... Other
47 48 Quality of tools I appreciate the blend of innovation and stabi... Other
48 49 Quality of tools Our advanced software tools offer a mix of eff... Other
49 50 Quality of tools The high quality of our system tools fosters p... Network
In [26]:
(df["AI Classification"] == df["Category"]).value_counts()

#TODO: Replace this be an evaluation function which is imported
Out[26]:
False    26
True     24
Name: count, dtype: int64
In [27]:
others = df[df['AI Classification'] == "Other"].copy()
others
Out[27]:
ID Category Feedback AI Classification
25 26 Security The new cybersecurity measures give me confidence in our data protection, though frequently changing protocols sometimes cause confusion. Other
27 28 Security The stringent security protocols inspire trust, even if the regular system prompts interrupt workflow more than I'd like. Other
28 29 Security Enhanced monitoring tools fortify our security practices, though occasional false alarms trigger unnecessary concerns. Other
29 30 Security Timely security updates fortify our systems, even though brief periods of isolation during updates can be disconcerting. Other
30 31 Security I appreciate the proactive approach to security, though the multi-step authentication process sometimes seems overly complicated. Other
31 32 Security The security team usually implements robust measures successfully, yet sporadic warnings create moments of heightened alertness. Other
32 33 Security Our enhanced firewall settings significantly lower threats, though frequent reminders to update can feel intrusive. Other
33 34 Security Robust security protocols foster a safe system environment, but occasional delays in updates sometimes spark minor apprehension. Other
34 35 Security I value our firm stance on cybersecurity, even as a continuous stream of alerts occasionally turns reassurance into slight irritation. Other
35 36 Business needs The IT solutions seem aligned with our evolving business needs, though occasional mismatches in technology and strategy leave me seeking clearer direction. Other
36 37 Business needs Our systems generally support business functions effectively, yet sporadic delays in adapting to new trends sometimes create operational bottlenecks. Other
37 38 Business needs The integration of technology with our business goals is promising, though outdated processes occasionally clash with modern expectations. Other
39 40 Business needs Our current infrastructure aligns with business needs, yet occasional rigidity in legacy systems hinders innovative approaches. Other
40 41 Business needs The balance between technical capability and business strategy is well-considered, though occasional oversights in market adaptability create subtle frustrations. Other
41 42 Business needs I recognize the efforts to merge IT and business strategies, though sporadic mismatches sometimes lead to project delays. Other
42 43 Quality of tools The quality of our primary tools is commendable, though sporadic bugs and outdated interfaces sometimes temper initial enthusiasm. Other
44 45 Quality of tools The suite of tools is modern and user-friendly, though a few lagging behind expectations stir brief episodes of frustration. Other
45 46 Quality of tools Our tools maintain high quality with intuitive design and regular updates, even though rare performance lags can hinder productivity. Other
46 47 Quality of tools The digital tools consistently support day-to-day work, yet occasional interface inconsistencies open up space for minor critiques. Other
47 48 Quality of tools I appreciate the blend of innovation and stability in our tools, though sporadic integration issues sometimes test my patience. Other
48 49 Quality of tools Our advanced software tools offer a mix of efficiency and creativity, but the occasional downtime introduces a fleeting sense of dismay. Other
In [28]:
# Vil bruke resonerinsmodell her, men den nekter å akseptere "reasoning_effort"
categorize_prompt_other = PromptTemplate.from_template(
"""
Categorize the following feedback from an IT-survey into the category that best describes the feedback.

Feedback:
<feedback>
{feedback}
</feedback>
"""
)

categorize_chain_structured_output_others = categorize_prompt_other | llm.with_structured_output(
    Categorize,
    method="json_schema",
    strict=True,
)

def categorize_single_feedback_other(feedback: str) -> str:
    result = categorize_chain_structured_output_others.invoke(
        {"feedback": feedback}
    )
    return result.category

# Find indices where AI Classification is 'Other'
other_indices = df[df['AI Classification'] == "Other"].index

# Apply classification function to only these rows and assign back correctly
df.loc[other_indices, "AI Classification"] = df.loc[other_indices, "Feedback"].apply(
    lambda feedback: categorize_single_feedback_other(feedback)
)

print("done")
done
In [ ]:
# Legger resultatene fra kategoriseringen av "Others" i en egen variabel som ikke har med de faktiske kategoriene.
# Denne kan da brukes i oppgavene lengere ned uten å forvirre LLMen på hva som er kategoriene den skal kjenne igjen.
categorized_survey = df[['ID','Feedback','AI Classification']]
categorized_survey
Out[ ]:
ID Feedback AI Classification
0 1 The recent training sessions on new software u... Training
1 2 The interactive training modules are well-desi... Training
2 3 I appreciate how the sessions cover both the b... Training
3 4 The hands-on exercises are engaging and boost ... Training
4 5 The training materials are succinct and creati... Training
5 6 Live workshops are full of energy and support,... Training
6 7 Although the sessions are structured to be int... IT-support
7 8 I find the training generally beneficial, yet ... Training
8 9 The curriculum is robust and adapts to emergin... Training
9 10 While the training sessions aim to cover a wid... Training
10 11 The office network's reliability is commendabl... Network
11 12 I appreciate the steady connectivity, although... Network
12 13 Network performance is generally satisfactory,... Network
13 14 The network infrastructure is robust and usual... Network
14 15 Connectivity is strong and dependable; however... Network
15 16 The network supports daily operations well, ye... Network
16 17 While the network operates effectively most ti... Network
17 18 The IT-support team is consistently prompt and... IT-support
18 19 I value IT-support’s clear follow-ups, yet del... IT-support
19 20 The assistance from IT-support is generally ef... IT-support
20 21 IT-support often exceeds expectations with tim... IT-support
21 22 Customer service from IT-support is engaging a... IT-support
22 23 The comprehensive responses from IT-support ar... IT-support
23 24 I appreciate IT-support’s readiness to tackle ... IT-support
24 25 While IT-support usually offers strong resolut... IT-support
25 26 The new cybersecurity measures give me confide... Cybersecurity
26 27 I feel more secure with the recent network def... Network
27 28 The stringent security protocols inspire trust... Security
28 29 Enhanced monitoring tools fortify our security... Security
29 30 Timely security updates fortify our systems, e... Security
30 31 I appreciate the proactive approach to securit... Security
31 32 The security team usually implements robust me... Security
32 33 Our enhanced firewall settings significantly l... Security
33 34 Robust security protocols foster a safe system... Security
34 35 I value our firm stance on cybersecurity, even... Cybersecurity
35 36 The IT solutions seem aligned with our evolvin... Alignment of IT Solutions with Business Needs
36 37 Our systems generally support business functio... System Performance
37 38 The integration of technology with our busines... Process Improvement
38 39 I appreciate the efforts to tailor IT solution... IT-support
39 40 Our current infrastructure aligns with busines... Infrastructure and Systems
40 41 The balance between technical capability and b... Business Strategy
41 42 I recognize the efforts to merge IT and busine... Project Management
42 43 The quality of our primary tools is commendabl... Tool Quality
43 44 I’m impressed by the robustness of our softwar... IT-support
44 45 The suite of tools is modern and user-friendly... User Experience
45 46 Our tools maintain high quality with intuitive... Tool Quality
46 47 The digital tools consistently support day-to-... User Experience
47 48 I appreciate the blend of innovation and stabi... Tool Performance
48 49 Our advanced software tools offer a mix of eff... Software Performance
49 50 The high quality of our system tools fosters p... Network

Chain-of-thought¶

Chain of Thought (CoT) er en teknikk innen prompt engineering som hjelper språkmodeller med å løse oppgaver som krever flere tankesteg. I stedet for å hoppe rett til svaret, blir modellen ledet gjennom en logisk og trinnvis prosess, noe som gir mer presise og gjennomtenkte svar – spesielt på komplekse problemer [1].

Du kan altså be modellen om å "tenke høyt" under oppgaven og forklare stegene sine før den leverer et endelig svar. Dette ber du om i prompten som sendes inn.

Eksempel på en prompt uten CoT:

Prompt: "Hvor mange armer har Eline og Kaspara?"

Svar: "4"

Eksempel på en prompt med CoT:

Prompt: "Hvor mange armer har ELine og Kaspara? Tenk trinn for trinn."

Svar: "En person har to armer. To personer betyr 2x2 = 4 armer. Svaret er 4."

Det kan være fordelaktig å bruke CoT når man jobber med komplekse oppgaver, da nøyaktigheten på outputet fra modellen øker når den "får lov" til å jobbe seg gjennom problemet. Dette gir ofte bedre resultater på logiske oppgaver, eller oppgaver med flere steg.

I tillegg kan du se hvordan modellen tenker, som gjør det lettere for deg å evaluere svaret. Det blir også lettere å se hvor det gikk galt hvis modellen svarer feil.

Kilder¶

[1] xxxx, Link: https://www.ibm.com/think/topics/chain-of-thoughts


TODO:

  • Forklare hva chain-of-thought er og hvorfor det kan være nyttig
In [ ]:
class CategorizeCot(BaseModel):
    "Categorization of a single feedback entry from an IT survey."
    chain_of_thought: str = Field(
        description="Use this space to think through the categorization."
    )
    category: CATEGORIES = Field(
        description="Chosen category for the feedback. Choose 'Other' if the other categories provided are not a good fit."  ## noqa: E501
    )


categorize_chain_cot = categorize_prompt2 | llm.with_structured_output(
    CategorizeCot,
    method="json_schema",
    strict=True,
)
In [ ]:
categorize_chain_cot.invoke(df["Feedback"][0])
Out[ ]:
CategorizeCot(chain_of_thought="The feedback mentions training sessions and discusses the clarity of guidance provided, as well as a desire for more practical examples. This clearly relates to the category of 'Training' as it focuses on the effectiveness and content of training sessions.", category='Training')
In [ ]:
result.category
Out[ ]:
'IT-support'

Resonneringsmodell¶

Resonneringsmodeller, som Azure Open AI sin O3-mini-modell, er språkmodeller som er spesielt trent på å tenke før de svarer. Slike modeller vil altså produsere en trinnvis tenkning før det endelige svaret leveres. Resonneringsmodeller er fordelaktige å bruke til oppgaver som krever kompleks problemløsning, logisk tenkning som koding eller matematikk eller til oppgaver med flere steg. De vil også være fordelaktige å bruke i situasjoner der nøyaktighet og forklarbarhet er viktig [2].

Dette kan minne om CoT, men det er en viktig forksjell her. CoT er en teknikk du kan bruke med språkmodeller for å gjøre dem bedre til å resonnere. Det er raskt og fleksibelt. Resonneringsmodeller er som nevnt en egen type modell som passer for oppgaver som krever presis og systematisk tenkning. Du vil få enda mer presise svar med en resonneringsmodell sammenlignet med CoT.

Kilder¶

[2] xxx, Link: https://platform.openai.com/docs/guides/reasoning?api-mode=chat

In [ ]:
# Nå har vi definert passende kategorier for alle tilbakemeldinhgene. 
# Videre vil vi få modellen til å oppsummere per kategori, slik at vi sitter igjen med en overordnet oversikt. 

from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputKeyParser

# Din tur (forslag)
'''Lag en LCEL-kjede som tar resultatet fra forrige oppgave (feedback med kategori) og lager en oppsummering
## per kategori ved hjelp av en LLM. Inkluder structured output.'''

    # Tips: Begynn med en funksjon som lager LCEL-kjeden
def summary_chain(x, y):
    prompt = ___

    # Tips 2: Bruk structured output med method = "function_calling"


# Fasit
## OBS: bytt ut feedback_txt med resultat fra forrige oppgave
def build_summary_chain(struktur, llm_model):
    # 1. Prompt Template
    prompt = PromptTemplate.from_template("""
You are a domain expert in internal IT operations and organizational analysis. You will be provided with a dataset containing qualitative feedback from employees in an IT company. 
Each row in the dataset represents a feedback entry and is associated with a specific category.

For each category, carefully:
1. Read and interpret the feedback entries assigned to that category.
2. Identify core themes, recurring patterns, and contrasting opinions within that category.
3. Evaluate the feedback logically: What are the likely underlying causes of recurring issues or praises? Are there signs of systemic problems, isolated incidents, or misaligned expectations?
4. Summarize each category in 3 to 6 bullet points, highlighting key sentiments (positive and negative), representative concerns or compliments, and any significant outliers

Present your findings in a clean, professional way with one section per category. 

This is the employee feedback data: {feedback_txt}
""")

    # 2. Structured output LLM
    structured_llm = llm_model.with_structured_output(struktur, method="function_calling")

    # 3. LCEL Chain
    chain = prompt | structured_llm

    return chain

chain = build_summary_chain(struktur, llm_model)
response = chain.invoke({"feedback_txt": feedback_txt})
In [ ]:
# Ekstraoppgave
'''Bruk oppsummeringen til å lage en rapport som kan sendes til ledelsen med forslag til endringer for å forbedre
resultatene på neste års undersøkelse'''

Forslag til endring av blokkene over for å bruke resultat fra etter re-klassifiseringen av "Others" + følge oppsettet vi har brukt hele veien¶

Per nå bruker den kategoriene vi fikk fra LLMen. Når den kategorieserer Other får vi mange nye og litt varierende kategorier. Det er også mange av de. Skulle vi bare brukt den som eksempel også bruke den ekte kategoriseringe her?

In [ ]:
# Nå har vi definert passende kategorier for alle tilbakemeldinhgene. 
#Videre vil vi få modellen til å oppsummere per kategori, slik at vi sitter igjen med en overordnet oversikt. 


# Din tur (forslag)
'''Lag en LCEL-kjede som tar resultatet fra forrige oppgave (feedback med kategori) og lager en oppsummering
per kategori ved hjelp av en LLM. Inkluder structured output.'''

# Tips 1: Lag en god promt! Bruk dette til å reflektere over hva det endelige målet er. 
prompt = PromptTemplate.from_template("""
Ja, hvordan kan jeg formulere meg her da? Bør det ikke komme med noe data her også?
""")

# Tips 2: Kan du lage en klasse som arver fra BaseModel for å gjøre dette enklere?
class SummarizeFeedback(BaseModel):
    "Beskrivelse..."
    summary : str = ...

# Tips 3: På tide å lage kjeden
summary_chain_structured_output = ...

# Tips 4: Kall modellen med datasettet fra undersøkelsen
summary = ...
In [ ]:
# Fasit
# 1. Prompt Template
summary_prompt = PromptTemplate.from_template("""
You are a domain expert in internal IT operations and organizational analysis. You will be provided with a dataset containing qualitative feedback from employees in an IT company. 
Each row in the dataset represents a feedback entry and is associated with a specific category.

For each category, carefully:
1. Read and interpret the feedback entries assigned to that category.
2. Identify core themes, recurring patterns, and contrasting opinions within that category.
3. Evaluate the feedback logically: What are the likely underlying causes of recurring issues or praises? Are there signs of systemic problems, isolated incidents, or misaligned expectations?
4. Summarize each category in 3 to 6 bullet points, highlighting key sentiments (positive and negative), representative concerns or compliments, and any significant outliers

Present your findings in a clean, professional way with one section per category. 

This is the employee feedback data: {survey_results}
""")

# 2. Class for structured output
class SummarizeFeedback(BaseModel):
    "Summary of the different categorizes recognized in the feedback from an IT-survey."
    summary : str = Field(
        description="For each category: Category name and 3-6 bullet points summarizing the category."
    )

# 3. Summary-chain
summary_chain_structured_output = summary_prompt | llm.with_structured_output(
    SummarizeFeedback,
    method="json_schema",
    strict=True,
)

# 4. Kall modellen med det kategoriserte datasettet fra undersøkelsen 
summary = summary_chain_structured_output.invoke({"survey_results": categorized_survey})
In [81]:
#Nice måte å vise outputten fra modellen 
from IPython.display import Markdown

display(Markdown(summary.summary))

Training¶

  • Positive feedback on the design and interactivity of training modules.
  • Hands-on exercises are appreciated for enhancing engagement and learning.
  • Some employees feel the breadth of topics covered could be improved.
  • Overall, training is seen as beneficial, but there are suggestions for more tailored content.

IT-support¶

  • IT-support is praised for promptness and effectiveness in resolving issues.
  • Clear communication and follow-ups are highlighted as strengths.
  • Some feedback indicates occasional delays in response times.
  • Overall, IT-support is viewed positively, with a few isolated concerns about consistency.

Network¶

  • The reliability and performance of the office network receive commendations.
  • Employees appreciate steady connectivity, though some mention occasional disruptions.
  • The network infrastructure is generally seen as robust and supportive of daily operations.
  • A few concerns about performance during peak usage times are noted.

Cybersecurity¶

  • Recent cybersecurity measures are viewed positively, enhancing employee confidence.
  • Employees appreciate proactive security updates and robust protocols.
  • There is a strong sentiment of trust in the security team's efforts.
  • Some feedback suggests a desire for more transparency regarding security measures.

Alignment of IT Solutions with Business Needs¶

  • IT solutions are generally seen as well-aligned with evolving business needs.
  • Employees appreciate efforts to tailor IT solutions to specific requirements.
  • There is recognition of the balance between technical capability and business strategy.
  • Some feedback indicates a need for ongoing adjustments to maintain alignment.

System Performance¶

  • Systems are reported to support business functions effectively.
  • Employees appreciate the integration of technology with business processes.
  • There are positive remarks about the overall performance of IT systems.
  • A few suggestions for improvements in system responsiveness are noted.

Process Improvement¶

  • Feedback indicates a recognition of ongoing efforts to improve processes.
  • Employees appreciate the focus on efficiency and effectiveness in IT operations.
  • Some suggestions for further enhancements in workflow are provided.

Tool Quality¶

  • The quality of software tools is highly regarded, with many praising their user-friendliness.
  • Employees appreciate the blend of innovation and stability in the tools provided.
  • There are positive remarks about the tools supporting day-to-day operations.
  • A few outliers express concerns about specific tool functionalities.

-----------------------------------------------------------------------------------¶

Rapport til ledelsen¶

In [96]:
# Fasit
report_prompt = PromptTemplate.from_template("""
You are an expert HR and technical operations analyst. I will provide you with a dataset of employee feedback collected from an IT company.

Your task is to deeply analyze this feedback and generate a concise executive-level summary report in markdown format that includes:

1. Key Takeaways
Provide a short summary of the overall feedback in 3-5 bullet points. Focus only on the main issues or areas of satisfaction.
Include both positive and negative themes, but prioritize the most important and impactful points.
Limit each point to 1-2 sentences.
Before finalizing each point, take a moment to reflect on why each issue might be present (e.g., systemic problems, temporary issues, resource constraints, etc.)

2. Suggested Improvements
Based on the overall feedback, propose 2-3 high-level, actionable measures that the company could take to address the most pressing issues and enhance overall performance or satisfaction.
Each suggestion should be brief, directly tied to the feedback, and strategic in nature.
Think about short-term vs long-term solutions and consider the feasibility of each suggestion.

3. Output
Present your findings in a structured way with clear section headings, bullet points for easy scanning, and a consise, direct and professional tone suitable for leadership review.

This is the employee feedback data: {summary_text}
""")

# 2. Class for structured output
class RaportForLeadership(BaseModel):
    "Raport for the leadership of an IT-company on results of an internal IT-survey."
    snappy_title : str = Field(
        description="A fitting title for the report. Must begin with '# ' to ensure easy markdown formatting."
    )
    intro : str = Field(
        description="1 sentence describing thepurpose of the report." 
    )
    key_takeaways : str = Field(
        description="3-5 bulletpoints describing the key-takeaways. Limit each point to 1-2 sentences." 
    )
    suggested_improvements : str = Field(
        description="2-3 actionable measures for the company. Keep it brief."
    )
    outro: str = Field(
        description="1 sentence ending for the report. Be creative." 
    )

# 3. Report-chain
report_chain_structured_output = report_prompt | llm.with_structured_output(
    RaportForLeadership,
    method="json_schema",
    strict=True,
)

# 4. Kall modellen med oppsummeringen av kategoriene
report = report_chain_structured_output.invoke({"summary_text": summary.summary})
In [97]:
report
Out[97]:
RaportForLeadership(snappy_title='# Employee Feedback Analysis: Key Insights and Recommendations', intro='This report summarizes the findings from the recent employee feedback survey, highlighting key themes and actionable improvements.', key_takeaways='- **Training Effectiveness**: Employees appreciate the interactive design of training modules but desire a broader range of topics and more tailored content, indicating a potential gap in addressing diverse learning needs.\n- **IT Support Performance**: While IT support is generally praised for its promptness and effectiveness, occasional delays suggest a need for improved consistency in response times.\n- **Network Reliability**: The office network is largely reliable, though some disruptions during peak usage times point to potential infrastructure limitations that need addressing.\n- **Cybersecurity Confidence**: Recent cybersecurity measures have bolstered employee trust, yet there is a call for greater transparency regarding these protocols to enhance confidence further.\n- **Alignment with Business Needs**: IT solutions are well-aligned with business needs, but ongoing adjustments are necessary to maintain this alignment as the business evolves.', suggested_improvements='- **Enhance Training Programs**: Develop a more diverse and tailored training curriculum that addresses specific employee needs and interests, ensuring all staff feel adequately supported in their professional development.\n- **Improve IT Support Consistency**: Implement a tracking system for IT support requests to identify patterns in delays and allocate resources more effectively, ensuring timely responses across the board.\n- **Increase Transparency in Cybersecurity**: Regularly communicate updates and insights regarding cybersecurity measures to employees, fostering a culture of trust and awareness around security practices.', outro='By addressing these key areas, we can enhance employee satisfaction and operational efficiency, paving the way for a more engaged and productive workforce.')
In [99]:
display(Markdown(
    "\n\n".join([report.snappy_title,
                 report.intro,
                 '## Key-takeaways',report.key_takeaways,
                 '## Suggested improvements',report.suggested_improvements,
                 report.outro])
                 
))

Employee Feedback Analysis: Key Insights and Recommendations¶

This report summarizes the findings from the recent employee feedback survey, highlighting key themes and actionable improvements.

Key-takeaways¶

  • Training Effectiveness: Employees appreciate the interactive design of training modules but desire a broader range of topics and more tailored content, indicating a potential gap in addressing diverse learning needs.
  • IT Support Performance: While IT support is generally praised for its promptness and effectiveness, occasional delays suggest a need for improved consistency in response times.
  • Network Reliability: The office network is largely reliable, though some disruptions during peak usage times point to potential infrastructure limitations that need addressing.
  • Cybersecurity Confidence: Recent cybersecurity measures have bolstered employee trust, yet there is a call for greater transparency regarding these protocols to enhance confidence further.
  • Alignment with Business Needs: IT solutions are well-aligned with business needs, but ongoing adjustments are necessary to maintain this alignment as the business evolves.

Suggested improvements¶

  • Enhance Training Programs: Develop a more diverse and tailored training curriculum that addresses specific employee needs and interests, ensuring all staff feel adequately supported in their professional development.
  • Improve IT Support Consistency: Implement a tracking system for IT support requests to identify patterns in delays and allocate resources more effectively, ensuring timely responses across the board.
  • Increase Transparency in Cybersecurity: Regularly communicate updates and insights regarding cybersecurity measures to employees, fostering a culture of trust and awareness around security practices.

By addressing these key areas, we can enhance employee satisfaction and operational efficiency, paving the way for a more engaged and productive workforce.

-----------------------------------------------------------------------------------¶

Hvordan kunne vi gjort dette bedre?¶

Anta at du har fått levert resultatene fra denne spørreundersøkelsen i fanget av en stressa mellomleder som ber deg levere en rapport han kan presentere ledelsen. Gitt verktøyene du har fått en innføring i gjennom dette kursene (og kanskje andre erfaringer?), hvordan ville du løst oppgaven?

Ser du for eksempel noe som kunne vært forbedret i

  • Rekkefølgen på måten vi leter etter kategorier?
  • Legger vi får mye/lite vekt på inputen vi fikk om hva ledelsen "tror" kategoriene kommer til å være?
  • Promptingen?
  • Variablene eller type hintingen i pydantic-klassene?

Ville du kanskje gjort det helt annerledes? Now's your chance to try!

In [ ]:
# Prøv deg frem :))