Submitted:
21 January 2025
Posted:
22 January 2025
You are already at the latest version
Abstract
Background/Objectives: With advancements in Large Language Models (LLMs), counseling chatbots are becoming vital tools for delivering scalable and accessible mental health support. Traditional evaluation scales, however, fail to adequately capture the sophisticated capabilities of these systems, such as personalized interactions, empathetic responses, and memory retention. This study aims to design a robust and comprehensive evaluation scale, the Comprehensive Evaluation Scale for LLM-Powered Counseling Chatbots (CES-LCC), using the eDelphi method to address this gap. Methods: A panel of 16 experts in psychology, artificial intelligence, human-computer interaction, and digital therapeutics participated in two iterative eDelphi rounds. The process focused on refining dimensions and items based on qualitative and quantitative feedback. Initial validation, conducted after assembling the final version of the scale, involved 49 participants using the CES-LCC to evaluate an LLM-powered chatbot delivering Self-Help Plus (SH+), an Acceptance and Commitment Therapy-based intervention for stress management.Results: The final version of the CES-LCC features 27 items grouped into nine dimensions: Understanding Requests, Providing Helpful Information, Clarity and Relevance of Responses, Language Quality, Trust, Emotional Support, Guidance and Direction, Memory, and Overall Satisfaction. Initial real-world validation revealed high internal consistency (Cronbach’s alpha = 0.94), although minor adjustments are required for specific dimensions, such as Clarity and Relevance of Responses.
Keywords:
1. Introduction
1.1. Background
1.2. Related Works
1.3. Aim
2. Materials and Methods
2.1. Participants
2.1. Procedure
2.2.1. Preparatory Phase
First Round
Second Round
2.2.3. Data Processing and Analysis
2.2.4. Conclusion and Reporting
2.3. Initial Validation in Real-World
3. Results
3.1. Demographic Description of Experts
3.2. First Round
3.3. Second Round
3.4. Initial Validation
4. Discussion
4.1. Implications
4.2. Limitations and Future Research
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Appendix A.1. Round 1 Results Overview (Items)Appendix A.2. Round 1 Qualitative Feedback (General and New Dimensions)Appendix A.3. Round 1 decision
| Dim | Item | Relevance | Priority | Sample QL Feedback | ||||||
| % Agree | M (SD) | Mdn (R) | IQR | M (SD) | Mdn (R) | IQR | ||||
| UR | The chatbot consistently understands what I am saying or asking. | 100.00 | 4.88 (0.34) | 5 (4-5) | 0.00 | 4.88 (0.34) | 5 (4-5) | 0.00 | - I’d add another item about the tone (e.g. The chatbot understands the tone of my request)- I believe an important feature is not only that it understands but also infers from the sentence | |
| I have to rephrase my requests very often for the chatbot to understand. | 62.50 | 3.69 (1.14) | 4 (1-5) | 1.25 | 3.88 (0.89) | 4 (3-5) | 2.00 | N/A | ||
| PHI | The chatbot provides accurate and helpful information. | 87.50 | 4.44 (1.09) | 5 (1-5) | 1.00 | 4.38 (1.15) | 5 (1-5) | 1.00 | - I would divide the question into two separate questions. Answers might be accurate, but not helpful and vice versa, so 1) the chatbot provides accurate information; 2) the chatbot provides helpful information- I think it's important to add that the information it provides is grounded in theoretical frameworks and scientific literature. | |
| The chatbot often provides incorrect or incomplete information. | 75.00 | 3.94(1.18) | 4 (1-5) | 1.25 | 3.69 (1.25) | 4 (1-5) | 0.75 | N/A | ||
| CRR | The chatbot's responses are clear, concise, and easy to understand. | 81.25 | 4.06 (1.00) | 4 (2-5) | 1.00 | 4.00 (1.03) | 4 (2-5) | 1.25 | - Conciseness and easiness are very different dimensions, I don't feel like they should be evaluated together- I’d add an item about verbosity (e.g. The chatbot adds superfluous information related to the query) | |
| The chatbot's responses are confusing or irrelevant to my questions. | 81.25 | 4.00 (1.32) | 4 (1-5) | 1.00 | 3.94 (1.12) | 4 (1-5) | 1.00 | - I’d split the item in two (confusing/irrelevant) | ||
| EUI | The chatbot is easy to interact with. | 62.50 | 3.94 (1.00) | 4 (2-5) | 2.00 | 3.63 (1.31) | 3.5 (1-5) | 2.00 | - I have no better way. My doubt is about the definition of easy. What does it mean? How can someone evaluate this dimension? | |
| Using the chatbot is frustrating or requires too many tentative interactions. | 62.50 | 3.81 (1.11) | 4 (1-5) | 2.00 | 3.56 (1.21) | 3.5 (2-5) | 2.25 | - The item is not very clear; does it refer to access or the actual internal use of the LLM? | ||
| LQ | The chatbot uses correct grammar and spelling in its responses. | 75.00 | 3.88 (1.02) | 4 (1-5) | 0.50 | 3.50 (1.21) | 4 (1-5) | 1.00 | N/A | |
| The chatbot's language style is natural and appropriate for the context. | 68.75 | 4.00 (0.97) | 4 (2-5) | 2.00 | 3.50 (1.37) | 3.5 (1-5) | 2.00 | - I would divide this item into 2 items: 1) The chatbot's language style is/seems natural; 2) The chatbot's language is appropriate for the context. | ||
| T | I believe the chatbot has my best interests at heart. | 43.75 | 3.13 (1.50) | 3 (1-5) | 2.25 | 3.13 (1.50) | 3 (1-5) | 2.25 | - I would avoid formulations that suggest that the chatbot might have a cognition.- I believe that it is tricky to talk about the chatbot as if it has agency and conscience also because it might lead to overreliance and/or excessive anthropomorphization of the tool. | |
| I am willing to rely on the chatbot in the future. | 62.50 | 3.81 (0.91) | 4 (2-5) | 1.25 | 3.50 (1.15) | 3.5 (2-5) | 1.50 | N/A | ||
| ES | The chatbot makes me feel heard and understood. | 75.00 | 4.06 (1.12) | 4 (1-5) | 1.25 | 3.94 (1.18) | 4 (1-5) | 1.25 | - I’d add an item about sense of humor (e.g. The chatbot shows to have a sense of humor when required) | |
| The chatbot's responses feel empathetic and supportive. | 87.50 | 4.19 (0.98) | 4 (2-5) | 1.00 | 4.00 (0.89) | 4 (1-5) | 0.00 | - I’d add an item about feeling reassured (e.g. The chatbot's inputs and responses can make me feel reassured) | ||
| GD | The chatbot provides helpful advice and suggestions for coping with my problems. | 93.75 | 4.38 (0.62) | 4 (3-5) | 1.00 | 4.25 (0.68) | 4 (3-5) | 1.00 | - This question might crossload into the “Providing helpful information” factor. However, I would keep it, because in this case it specifically talks about coping, but maybe phrasing it as follows: The chatbot provides adjusted guidance in coping with my problems. | |
| The chatbot encourages me to take positive steps towards my goals. | 75.00 | 3.88 (0.81) | 4 (2-5) | 0.25 | 3.81 (0.83) | 4 (2-5) | 1.00 | - I believe it is important to assess an individual's goals carefully. For example, a person with an eating disorder might set a goal to lose an extreme amount of weight, which is unhealthy. Therefore, it's crucial to remember that a patient's goals are not always the best for their well-being. | ||
| OS | I am overall satisfied with the usability of this chatbot. | 87.50 | 4.44 (0.73) | 5 (3-5) | 1.00 | 4.44 (0.73) | 5 (3-5) | 1.00 | - I want to point out that not only the usability but also the effectiveness in helping is important. | |
| I would not recommend this chatbot to others due to usability issues. | 75.00 | 3.88 (1.15) | 4 (1-5) | 1.25 | 3.88 (1.31) | 4 (1-5) | 2.00 | - This statement sounds somehow redundant to the first one in terms that lower scores on statement 1 seem to be almost equivalent to high | ||
Appendix A.2. Round 1 Qualitative Feedback (General and New Dimensions)
| Feedback type | Content |
| New dimensions |
- I think assessing memory quality is crucial when dealing with real world implementations using LLMs. So, I suggest adding this dimension to the assessment. - I believe there are missing items related to the perception of how the chatbot handles my privacy and data security, such as how it shares my personal information with third parties. - In my opinion, privacy and data security, especially regarding how the chatbot shares personal information, are important aspects not included in the items. However, these seem more tied to production or implementation and might be better addressed in a separate, dedicated evaluation. - Items related to data privacy and security might be relevant in this scenario. However, in my experience, these items are more aligned with production or implementation processes and might be better addressed under regulations like the EU AI Act and GDPR. |
| General |
- Almost all the statements sound actually very relevant, I provided some lower scores in some of them just to distinguish the ones I think are most relevant but in general all are relevant! - I felt like all of the shown questions were relevant in some way: that's why some evaluations were a bit harsh, just so that I could express what is more relevant from my point of view. Anyway, the questions were all pretty clear - An important consideration for real-life implementation of LLM-powered chatbots is ensuring accessibility for a wide range of users. This includes compatibility with various devices, such as smartphones, tablets, and computers, to meet diverse user needs. Additionally, designing the chatbot to be inclusive is crucial—for example, allowing users to specify preferred names and pronouns to support transgender and gender-diverse individuals, and incorporating features like colorblind-friendly graphics or text presentation options to assist users with visual impairments or reading difficulties. These steps can significantly enhance user experience and inclusivity. - Psychometrically, factors should have at least 3 items to be considered reliable, with 2 items it is not even possible to calculate internal consistency |
Appendix A.3. Round 1 decision
| Dim | Add (Motivation) | Modify (Motivation) | Drop (Motivation) |
| UR | - “The chatbot understands the tone of my request” (QL Feedback) - “The chatbot asks specific questions to better understand my requests” (QL Feedback) - “The chatbot infers information from my messages” (QL Feedback) |
Nothing | - “I have to rephrase my requests very often for the chatbot to understand.” (% Agree Relevance) |
| PHI | - “The chatbot provides information grounded in theory and scientific literature.” (QL Feedback) - “The chatbot provides references.” (QL Feedback) |
- Split the item “The chatbot provides accurate and helpful information.” into “The chatbot provides accurate information” and “The chatbot provides helpful information” (QL Feedback) - Split the item “The chatbot often provides incorrect or incomplete information.” into “The chatbot often provides incorrect information.” and “The chatbot often provides incomplete information.” (QL Feedback) |
Nothing |
| CRR | - “The chatbot adds superfluous information related to the query” (QL Feedback) | - Split the item “The chatbot's responses are clear, concise, and easy to understand.” into two different items “The chatbot's responses are clear, and easy to understand”, “The chatbot's responses are adequately concise” (QL Feedback) - Split the item “The chatbot's responses are confusing or irrelevant to my questions.” into “The chatbot's responses are confusing” and “The chatbot's responses are irrelevant to my questions.” (QL Feedback) |
Nothing |
| EUI | Nothing | Nothing | - The entire dimension (% Agree Relevance, IQR Relevance, QL Feedback) |
| LQ | Nothing | - Split the item “The chatbot's language style is natural and appropriate for the context.” into “The chatbot's language style is/seems natural” and “The chatbot's language is appropriate for the context.” (% Agree Relevance, IQR Relevance, QL Feedback) | Nothing |
| T | - “I feel safe sharing my personal matters with the chatbot” (QL Feedback) - “I believe that the feedback/information provided by the chatbot are trustworthy” (QL Feedback) - “I believe the chatbot is transparent about its limitations and capabilities.” (QL Feedback) |
Nothing | - “I believe the chatbot has my best interests at heart” (% Agree Relevance, IQR Relevance, QL Feedback) - “I am willing to rely on the chatbot in the future.” (% Agree Relevance) |
| ES | - “The chatbot's responses can make me feel reassured” (QL Feedback) - “The chatbot shows to have a sense of humor when required” (QL Feedback) |
Nothing | Nothing |
| GD | - “The chatbot helps me set realistic and achievable goals.” (QL Feedback) | - Modify “The chatbot provides helpful advice and suggestions for coping with my problems.” into “The chatbot provides adjusted guidance in coping with my problems” to avoid cross loading with other factor (QL Feedback) - Modify “The chatbot encourages me to take positive steps.” (QL Feedback) |
Nothing |
| OS | - “I am overall satisfied with the effectiveness of this chatbot” (QL Feedback) - “I feel that my interactions with the chatbot were worthwhile.” (QL Feedback) |
Nothing | - “I would not recommend this chatbot to others due to usability issues.” (Redundancy, QL Feedback) |
| M [New] | - “The chatbot accurately recalls key details from previous conversations.” (QL Feedback) - “The chatbot maintains consistency by integrating past interactions into current responses.” (QL Feedback) - “The chatbot adapts its advice based on information provided in earlier sessions.” (QL Feedback) |
Nothing | Nothing |
Appendix B
Appendix B.1. Round 2 Results Overview (Items) Appendix B.2. Round 2 Decision
| Dim | Item (Italian) | Relevance | Redund | Priority | Translation Qual | Sample QL Feedback | |||||
| % Agree | M (SD) | Mdn (R) | IQR | Flags | Points | % Agree | M (SD) | ||||
| UR | The chatbot consistently understands what I am saying or asking. (Il chatbot capisce sempre ciò che sto dicendo o chiedendo.) |
100.00 | 4.87 (0.35) | 5 (4-5) | 0.00 | 1 | 56 | 73.33 | 4.73 (0.47) |
- I’d prefer "The chatbot consistently understands what I am saying AND asking". The "or" makes it hard to trust high scores. [Content] - Toglierei il "sempre" che in italiano potrebbe inserire un dubbio invece che rafforzare [Translation] |
|
| The chatbot understands the tone of my request. (Il chatbot capisce il tono della mia richiesta.) |
73.33 | 4.07 (0.80) | 4 (3-5) | 1.50 | 3 | 27 | 73.33 | 4.67 (0.65) | - It is not clear to me what "understanding the tone" means here [Content] | ||
| The chatbot asks specific questions to better understand my requests. (Il chatbot fa domande specifiche per capire meglio le mie richieste.) |
86.67 | 4.20 (0.68) | 4 (3-5) | 1.00 | 0 | 30 | 80.00 | 4.92 (0.29) | N/A | ||
| The chatbot infers information from my messages. (Il chatbot inferisce informazioni dai miei messaggi.) |
80.00 | 4.43 (0.94) | 5 (2-5) | 1.00 | 2 | 37 | 60.00 | 4.25 (1.06) |
- The term "infer" is a bit ambiguous; I would suggest revising it as follows: "The chatbot is able to make adequate inferences based on my messages." [Content] - Il chatbot deduce informazioni dai miei messaggi [Translation] |
||
| PHI | The chatbot provides accurate information. (Il chatbot fornisce informazioni accurate.) |
86.67 | 4.47 (1.09) | 5 (2-5) | 1.00 | 2 | 81 | 80.00 | 4.83 (0.39) | N/A | |
| The chatbot provides helpful information. (Il chatbot fornisce informazioni utili.) |
100.00 | 4.80 (0.41) | 5 (4-5) | 0.00 | 0 | 68 | 80.00 | 4.92 (0.29) | N/A | ||
| The chatbot often provides incorrect information. (Il chatbot fornisce spesso informazioni errate.) |
80.00 | 4.13 (1.13) | 4 (1-5) | 1.00 | 4 | 52 | 80.00 | 4.92 (0.29) | N/A | ||
| The chatbot often provides incomplete information. (Il chatbot fornisce spesso informazioni incomplete.) |
73.33 | 4.00 (0.76) | 4 (3-5) | 1.00 | 1 | 53 | 80.00 | 4.92 (0.29) | N/A | ||
| The chatbot provides information grounded in theory and scientific literature. (Il chatbot fornisce informazioni basate su teorie e letteratura scientifica.) |
80.00 | 4.13 (1.13) | 4 (1-5) | 1.00 | 3 | 35 | 73.33 | 4.50 (0.67) | - Il chatbot fornisce informazioni supportate da teorie e letteratura [Translation] | ||
| The chatbot provides references. (Il chatbot fornisce riferimenti bibliografici.) |
66.67 | 3.60 (1.06) | 4 (1-5) | 1.00 | 5 | 26 | 60.00 | 4.42 (0.90) |
- I don't think it's crucial for users to have a research paper attached to questions such as "I feel bad lately, I can't sleep". It would make the UX poorer in my opinion. This would make more sense if you are building a search engine kind of system. [Content] - Il chatbot fornisce riferimenti alle fonti utilizzate [Translation] |
||
| CRR | The chatbot's responses are clear, and easy to understand. (Le risposte del chatbot sono chiare e facili da capire.) |
100.00 | 4.93 (0.26) | 5 (4-5) | 0.00 | 0 | 57 | 73.33 | 4.75 (0.62) | - Le risposte del chatbot sono chiare e semplici da capire [Translation] | |
| The chatbot's responses are adequately concise. (Le risposte del chatbot sono sufficientemente concise.) |
80.00 | 4.13 (0.74) | 4 (3-5) | 1.00 | 1 | 53 | 80.00 | 4.75 (0.45) | N/A | ||
| The chatbot's responses are confusing. (Le risposte del chatbot sono confondenti.) |
93.33 | 4.07 (0.96) | 4 (1-5) | 0.50 | 5 | 50 | 53.33 | 3.83 (1.19) |
- This is just the reverse of clear [Content] - Le risposte del chat mi confondono [Translation] |
||
| The chatbot's responses are irrelevant to my questions. (Le risposte del chatbot non sono pertinenti alle mie domande.) |
100.00 | 4.73 (0.46) | 5 (4-5) | 0.50 | 1 | 42 | 80.00 | 4.92 (0.29) | N/A | ||
| The chatbot adds superfluous information related to the query. (Il chatbot aggiunge informazioni superflue relative alla richiesta.) |
53.33 | 3.47 (0.92) | 4 (1-5) | 1.00 | 8 | 23 | 60.00 | 4.45 (1.29) | - Il chatbot aggiunge informazioni superflue rispetto alla richiesta. [Translation] | ||
| LQ | The chatbot uses correct grammar and spelling in its responses. (Il chatbot fornisce risposte con grammatica e ortografia corrette.) |
80.00 | 3.73 (1.22) | 4 (1-5) | 0.00 | 2 | 34 | 60.00 | 4.42 (0.90) | - Il chatbot fornisce risposte grammaticalmente e ortograficamente corrette. [Translation] | |
| The chatbot's language style is/seems natural. (Lo stile linguistico del chatbot è/sembra naturale.) |
86.67 | 4.47 (0.74) | 5 (3-5) | 1.00 | 0 | 23 | 66.67 | 4.67 (0.78) |
- “The chatbot's language style sounds natural” seems more fluent [Content] - Lo stile linguistico del chatbot suona naturale [Translation] |
||
| The chatbot's language is appropriate for the context. (Il linguaggio del chatbot è appropriato per il contesto.) |
86.67 | 4.57 (0.65) | 5 (3-5) | 1.00 | 0 | 33 | 73.33 | 4.58 (0.67) | - Il linguaggio del chatbot è appropriato al contesto. [Translation] | ||
| T | I feel safe sharing my personal matters with the chatbot. (Mi sento al sicuro nel condividere questioni personali con il chatbot.) |
93.33 | 4.53 (1.06) | 5 (1-5) | 0.50 | 0 | 37 | 80.00 | 4.75 (0.45) | N/A | |
| I believe the chatbot is transparent about its limitations and capabilities. (Credo che il chatbot sia trasparente riguardo alle sue limitazioni e capacità.) |
73.33 | 4.13 (0.83) | 4 (3-5) | 1.50 | 0 | 21 | 53.33 | 4.18 (1.08) | - Credo che il chatbot sia trasparente riguardo ai suoi limiti e alle sue capacità [Translation] | ||
| I believe that the feedback/information provided by the chatbot are trustworthy. (Credo che i feedback/le informazioni fornite dal chatbot siano affidabili.) |
93.33 | 4.67 (0.82) | 5 (2-5) | 0.00 | 2 | 32 | 66.67 | 4.90 (0.32) | - Mettere una e invece che la slash / [Translation] | ||
| ES | The chatbot makes me feel heard and understood. (Il chatbot mi fa sentire ascoltato e capito.) |
86.67 | 4.20 (1.08) | 4 (1-5) | 1.00 | 1 | 53 | 66.67 | 4.80 (0.42) | - I would drop this. I feel like this evaluates how the system can trick the user in terms of feeling like they are talking to someone that listens to them and understands, while an LLM obviously cannot do that. [Content] | |
| The chatbot's responses feel empathetic and supportive. (Le risposte del chatbot sembrano empatiche e di supporto.) |
93.33 | 4.60 (0.63) | 5 (3-5) | 1.00 | 1 | 43 | 53.33 | 4.10 (1.29) |
- I think this is different to the previous one, because it focuses on the "look" of the answers more than on the ability of convincing the user of something. This is something that makes sense to evaluate I think [Content] - “e supportive” invece che “di supporto” [Translation] |
||
| The chatbot's responses can make me feel reassured (Le risposte del chatbot sono in grado di farmi sentire rassicurato.) |
80.00 | 4.20 (0.77) | 4 (3-5) | 1.00 | 3 | 34 | 60.00 | 4.70 (0.67) | N/A | ||
| The chatbot shows to have a sense of humor when required (Il chatbot dimostra di avere senso dell'umorismo quando necessario.) |
60.00 | 3.20 (1.42) | 4 (1-5) | 2.00 | 4 | 20 | 60.00 | 4.70 (0.67) | N/A | ||
| GD | The chatbot provides adjusted guidance in coping with my problems. (Il chatbot mi fornisce indicazioni adeguate per affrontare i problemi che riporto.) |
86.67 | 4.53 (0.74) | 5 (3-5) | 1.00 | 0 | 38 | 46.67 | 3.90 (1.29) |
- Nel tradurre coping suggerirei di dire "per gestire" invece che per affrontare [Translation] - Il chatbot fornisce indicazioni adeguate per affrontare i miei problem [Translation] |
|
| The chatbot helps me set realistic and achievable goals. (Il chatbot mi aiuta a stabilire obiettivi realistici e raggiungibili.) |
100.00 | 4.53 (0.52) | 5 (4-5) | 1.00 | 1 | 25 | 66.67 | 4.80 (0.42) | N/A | ||
| The chatbot encourages me to take positive steps. (Il chatbot mi incoraggia a compiere sforzi per il mio benessere.) |
86.67 | 4.27 (1.03) | 5 (2-5) | 1.00 | 2 | 27 | 53.33 | 3.82 (0.87) |
- Il chatbot mi incoraggia a compiere azioni costruttive. [Translation] - Il chatbot mi incoraggia a compiere passi positivi [Translation] |
||
| M | The chatbot accurately recalls key details from previous conversations. (Il chatbot ricorda accuratamente i dettagli chiave delle conversazioni precedenti.) |
100.00 | 4.73 (0.46) | 5 (4-5) | 0.50 | 1 | 39 | 60.00 | 4.55 (0.82) | - I would delate "key", to not make it seem like if the chatbot can understand personal saliance, but rather its capacity to recall information at large this item is important. [Content] | |
| The chatbot maintains consistency by integrating past interactions into current responses. (Il chatbot integra coerentemente le interazioni passate nelle risposte attuali.) |
93.33 | 4.80 (0.56) | 5 (3-5) | 0.00 | 4 | 26 | 53.33 | 4.50 (0.85) |
- Il chatbot è coerente ed integra le interazioni passate nelle risposte attuali. [Translation] - Il chatbot integra coerentemente le interazioni passate nelle risposte [Translation] |
||
| The chatbot adapts its advice based on information provided in earlier sessions. (Il chatbot adatta i suoi consigli in base alle informazioni fornite nelle sessioni precedenti.) |
93.33 | 4.67 (0.62) | 5 (3-5) | 0.50 | 6 | 25 | 73.33 | 4.82 (0.40) | N/A | ||
| OS | I am overall satisfied with the usability of this chatbot. (Sono complessivamente soddisfatto dell'usabilità di questo chatbot.) |
93.33 | 4.53 (0.64) | 5 (3-5) | 1.00 | 0 | 39 | 73.33 | 4.64 (0.50) | - Nel complesso, sono soddisfatto dell'usabilità di questo chatbot [Translation] | |
| I feel that my interactions with the chatbot were worthwhile. (Trovo che le mie interazioni con il chatbot siano state utili.) |
86.67 | 4.20 (0.68) | 4 (3-5) | 1.00 | 3 | 27 | 66.67 | 4.73 (0.65) | - Trovo che le mie interazioni con il chatbot siano state proficue [Translation] | ||
| I am overall satisfied with the effectiveness of this chatbot. (Sono complessivamente soddisfatto dell'efficacia di questo chatbot.) |
73.33 | 4.20 (1.01) | 5 (2-5) | 1.50 | 2 | 24 | 60.00 | 4.70 (0.67) | - Nel complesso, sono soddisfatto... [Translation] | ||
Appendix B.2. Round 2 Decision
| Dim | Add (Motivation) | Modify (Motivation) | Drop (Motivation) |
| UR | Nothing | - Rephrase both the italian translation and the original item (“The chatbot consistently understands what I am saying or asking.”) into: “The chatbot consistently understands what I am saying and asking” and “Il chatbot capisce ciò che sto dicendo e chiedendo.” (% Agree Translation, QL Feedback) - Rephrase both the italian translation and the original item (“The chatbot infers information from my messages.”) into: “The chatbot is able to make adequate inferences based on my messages.” and “Il chatbot è in grado di fare deduzioni appropriate basandosi sui miei messaggi.” (% Agree Translation, QL Feedback) |
- “The chatbot understands the tone of my request.” (% Agree Relevance, QL Feedback) |
| PHI | Nothing | - Rephrase the italian version of the item “The chatbot provides information grounded in theory and scientific literature.” into “Il chatbot fornisce informazioni supportate da teorie e letteratura scientifica.” (% Agree Translation, QL Feedback) | - The chatbot often provides incorrect information. (Redundancy) - The chatbot often provides incomplete information. (% Agree Relevance) - The chatbot provides references. (% Agree Relevance, Redundancy) |
| CRR | Nothing | - Rephrase the italian version of the item “The chatbot's responses are clear, and easy to understand.” into “Le risposte del chatbot sono chiare e semplici da capire.” (% Agree Translation, QL Feedback) | - The chatbot's responses are confusing. (Redundancy, QL Feedback) - The chatbot adds superfluous information related to the query. (% Agree Relevance, Redundancy, QL Feedback) |
| LQ | Nothing | - Rephrase the italian version of the item “The chatbot uses correct grammar and spelling in its responses.” into “Il chatbot fornisce risposte grammaticalmente e ortograficamente corrette.” (% Agree Translation, QL Feedback) - Rephrase both the italian translation and the original item (“The chatbot's language style is/seems natural.”) into: “The chatbot's language style sounds natural.” and “Lo stile linguistico del chatbot suona naturale.” (% Agree Translation, QL Feedback) - Rephrase the italian version of the item “The chatbot's language is appropriate for the context.” into “Il linguaggio del chatbot è appropriato al contesto.” (% Agree Translation, QL Feedback) |
Nothing |
| T | Nothing | - Rephrase the italian version of the item “I believe the chatbot is transparent about its limitations and capabilities.” into “Credo che il chatbot sia trasparente riguardo ai suoi limiti e alle sue capacità” (% Agree Translation, QL Feedback) - Rephrase both the italian translation and the original item (“I believe that the feedback/information provided by the chatbot are trustworthy.”) into: “I believe that the feedback and the information provided by the chatbot are trustworthy.” and “Credo che i feedback e le informazioni fornite dal chatbot siano affidabili.” (% Agree Translation, QL Feedback) |
Nothing |
| ES | Nothing | - Rephrase the italian version of the item “The chatbot's responses feel empathetic and supportive.” into “Le risposte del chatbot risultano empatiche e supportive.” (% Agree Translation, QL Feedback) | - “The chatbot shows to have a sense of humor when required” (% Agree Relevance, Redundancy, QL Feedback) |
| GD | Nothing | - Rephrase the italian version of the item “The chatbot provides adjusted guidance in coping with my problems.” into “Il chatbot fornisce indicazioni personalizzate per aiutarmi a gestire i miei problemi.” (% Agree Translation, QL Feedback) - Rephrase the italian version of the item “The chatbot encourages me to take positive steps.” into “Il chatbot mi incoraggia a compiere azioni costruttive.” (% Agree Translation, QL Feedback) |
Nothing |
| M | Nothing | - Rephrase both the italian translation and the original item (“The chatbot accurately recalls key details from previous conversations.”) into: “The chatbot accurately recalls details from previous conversations.” and “Il chatbot ricorda accuratamente i dettagli delle conversazioni precedenti.” (% Agree Translation, QL Feedback) - Rephrase the italian version of the item “The chatbot maintains consistency by integrating past interactions into current responses.” into “Il chatbot integra coerentemente le interazioni passate nelle risposte.” (% Agree Translation, QL Feedback) |
Nothing |
| OS | Nothing | - Rephrase the italian version of the item “I am overall satisfied with the usability of this chatbot.” into Nel complesso, sono soddisfatto dell'usabilità di questo chatbot.” (% Agree Translation, QL Feedback) - Rephrase both the italian translation and the original item (“I feel that my interactions with the chatbot were worthwhile.”) into: “Overall, I feel that my interactions with the chatbot were worthwhile.” and “Nel complesso, trovo che le mie interazioni con il chatbot siano state proficue.” (% Agree Translation, QL Feedback) - Rephrase both the italian translation and the original item (“I am overall satisfied with the effectiveness of this chatbot.”) into: “I am overall satisfied with the support provided by this chatbot.” and “Nel complesso, sono soddisfatto del supporto offerto da questo chatbot.” (% Agree Translation, % Agree Relevance, QL Feedback) |
Nothing |
Appendix C
Appendix C.1. Demographic Profile of Users who Participated in the Initial Validation
| Characteristic | Value or % (n) | |
| Age | M = 32.02 (SD = 11.55) | |
| Gender | Female | 57.14% (28) |
| Male | 40.81% (20) | |
| Not specified | 2.05% (1) | |
| Education | EQF1 | 0.00% (0) |
| EQF2 | 8.16% (4) | |
| EQF3 | 2.04% (1) | |
| EQF4 | 14.29% (7) | |
| EQF5 | 0.00% (0) | |
| EQF6 | 28.57% (14) | |
| EQF7 | 30.61% (15) | |
| EQF8 | 16.33% (8) | |
| Chatbot Experience | None | 18.37% (9) |
| Basic | 32.65% (16) | |
| Intermediate | 34.69% (17) | |
| Expert | 14.29% (7) | |
| LLM Experience | None | 24.49% (12) |
| Basic | 38.78% (19) | |
| Intermediate | 26.53% (13) | |
| Expert | 10.20% (5) | |
| Propensity to Trust in Technology [71] | M = 3.76 (SD = 0.51) | |
| Country | Italy | 100% (49) |
References
- E. Bendig, B. Erb, L. Schulze-Thuesing, and H. Baumeister, “The Next Generation: Chatbots in Clinical Psychology and Psychotherapy to Foster Mental Health – A Scoping Review,” Verhaltenstherapie, vol. 32, no. Suppl. 1, pp. 64–76, 2022. [CrossRef]
- M. Laymouna, Y. Ma, D. Lessard, T. Schuster, K. Engler, and B. Lebouché, “Roles, Users, Benefits, and Limitations of Chatbots in Health Care: Rapid Review,” J Med Internet Res, vol. 26, p. e56930, Jul. 2024. [CrossRef]
- L. Balcombe, “AI Chatbots in Digital Mental Health,” Informatics, vol. 10, no. 4, p. 82, Oct. 2023. [CrossRef]
- E. M. Boucher et al., “Artificially intelligent chatbots in digital mental health interventions: a review,” Expert Rev Med Devices, vol. 18, no. sup1, pp. 37–49, Dec. 2021. [CrossRef]
- M. Skjuve, A. Følstad, and P. B. Brandtzaeg, “The User Experience of ChatGPT: Findings from a Questionnaire Study of Early Users,” in Proceedings of the 5th International Conference on Conversational User Interfaces, New York, NY, USA: ACM, Jul. 2023, pp. 1–10. [CrossRef]
- S. Limpanopparat, E. Gibson, and D. A. Harris, “User engagement, attitudes, and the effectiveness of chatbots as a mental health intervention: A systematic review,” Computers in Human Behavior: Artificial Humans, vol. 2, no. 2, p. 100081, Aug. 2024. [CrossRef]
- H. L. O’Brien and E. G. Toms, “What is user engagement? A conceptual framework for defining user engagement with technology,” Journal of the American Society for Information Science and Technology, vol. 59, no. 6, pp. 938–955, Apr. 2008. [CrossRef]
- M. Hassenzahl and N. Tractinsky, “User experience - a research agenda,” Behaviour & Information Technology, vol. 25, no. 2, pp. 91–97, Mar. 2006. [CrossRef]
- B. Shackel, “Usability – Context, framework, definition, design and evaluation,” Interact Comput, vol. 21, no. 5–6, pp. 339–346, Dec. 200. [CrossRef]
- J. Moilanen, A. Visuri, S. A. Suryanarayana, A. Alorwu, K. Yatani, and S. Hosio, “Measuring the Effect of Mental Health Chatbot Personality on User Engagement,” in Proceedings of the 21st International Conference on Mobile and Ubiquitous Multimedia, New York, NY, USA: ACM, Nov. 2022, pp. 138–150. [CrossRef]
- 11. S. Gabrielli et al., “Engagement and Effectiveness of a Healthy-Coping Intervention via Chatbot for University Students During the COVID-19 Pandemic: Mixed Methods Proof-of-Concept Study,” JMIR Mhealth Uhealth, vol. 9, no. 5, p. e27965, May 2021. [CrossRef]
- H. L. O’Brien and E. G. Toms, “The development and evaluation of a survey to measure user engagement,” Journal of the American Society for Information Science and Technology, vol. 61, no. 1, pp. 50–69, Jan. 2010. [CrossRef]
- K. Denecke, S. Vaaheesan, and A. Arulnathan, “A Mental Health Chatbot for Regulating Emotions (SERMO) - Concept and Usability Test,” IEEE Trans Emerg Top Comput, vol. 9, no. 3, pp. 1170–1182, Jul. 2021. [CrossRef]
- C. G. Escobar-Viera, G. Porta, R. W. S. Coulter, J. Martina, J. Goldbach, and B. L. Rollman, “A chatbot-delivered intervention for optimizing social media use and reducing perceived isolation among rural-living LGBTQ+ youth: Development, acceptability, usability, satisfaction, and utility,” Internet Interv, vol. 34, p. 100668, Dec. 2023. [CrossRef]
- M. R. Lima, M. Wairagkar, N. Natarajan, S. Vaitheswaran, and R. Vaidyanathan, “Robotic Telemedicine for Mental Health: A Multimodal Approach to Improve Human-Robot Engagement,” Front Robot AI, vol. 8, Mar. 2021. [CrossRef]
- B. Laugwitz, T. Held, and M. Schrepp, “Construction and Evaluation of a User Experience Questionnaire,” 2008, pp. 63–76. [CrossRef]
- 17. J. Shah et al., “Development and usability testing of a chatbot to promote mental health services use among individuals with eating disorders following screening,” International Journal of Eating Disorders, vol. 55, no. 9, pp. 1229–1244, Sep. 2022. [CrossRef]
- K. Boyd et al., “Usability testing and trust analysis of a mental health and wellbeing chatbot,” in Proceedings of the 33rd European Conference on Cognitive Ergonomics, New York, NY, USA: ACM, Oct. 2022, pp. 1–8. [CrossRef]
- M. N. Islam, S. R. Khan, N. N. Islam, Md. Rezwan-A-Rownok, S. R. Zaman, and S. R. Zaman, “A Mobile Application for Mental Health Care During COVID-19 Pandemic: Development and Usability Evaluation with System Usability Scale,” 2021, pp. 33–42. [CrossRef]
- S. Valtolina, P. Zanotti, and S. Mandelli, “Designing Conversational Agents to Empower Active Aging,” in Proceedings of the ACM International Conference on Intelligent Virtual Agents, New York, NY, USA: ACM, Sep. 2024, pp. 1–4. [CrossRef]
- J. Brooke, “SUS: A ‘Quick and Dirty’ Usability Scale,” in Usability Evaluation In Industry, CRC Press, 1996, pp. 207–212. [CrossRef]
- S. Holmes, A. Moorhead, R. Bond, H. Zheng, V. Coates, and M. Mctear, “Usability testing of a healthcare chatbot: Can we use conventional methods to assess conversational user interfaces?,” in Proceedings of the 31st European Conference on Cognitive Ergonomics, New York, NY, USA: ACM, Sep. 2019, pp. 207–214. [CrossRef]
- S. Borsci et al., “The Chatbot Usability Scale: the Design and Pilot of a Usability Scale for Interaction with AI-Based Conversational Agents,” Pers Ubiquitous Comput, vol. 26, no. 1, pp. 95–119, Feb. 2022. [CrossRef]
- T. Henkel, A. J. Linn, and M. J. van der Goot, “Understanding the Intention to Use Mental Health Chatbots Among LGBTQIA+ Individuals: Testing and Extending the UTAUT,” 2023, pp. 83–100.
- T. Kamita, T. Ito, A. Matsumoto, T. Munakata, and T. Inoue, “A Chatbot System for Mental Healthcare Based on SAT Counseling Method,” Mobile Information Systems, vol. 2019, pp. 1–11, Mar. 2019.
- Venkatesh, Morris, Davis, and Davis, “User Acceptance of Information Technology: Toward a Unified View,” MIS Quarterly, vol. 27, no. 3, p. 425, 2003. [CrossRef]
- F. D. Davis, “Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology,” MIS Quarterly, vol. 13, no. 3, p. 319, Sep. 1989. [CrossRef]
- K. Ahuja and P. Lio, “Measuring Empathy in Artificial Intelligence: Insights From Psychodermatology and Implications for General Practice,” Prim Care Companion CNS Disord, vol. 26, no. 5, Oct. 2024. [CrossRef]
- J. Zhao, F. M. Plaza-del-Arco, B. Genchel, and A. C. Curry, “Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks,” Jun. 2024.
- M. Schmidmaier, J. Rupp, D. Cvetanova, and S. Mayer, “Perceived Empathy of Technology Scale (PETS): Measuring Empathy of Systems Toward the User,” in Proceedings of the CHI Conference on Human Factors in Computing Systems, New York, NY, USA: ACM, May 2024, pp. 1–18. [CrossRef]
- S. Concannon and M. Tomalin, “Measuring perceived empathy in dialogue systems,” AI Soc, vol. 39, no. 5, pp. 2233–2247, Oct. 2024. [CrossRef]
- Miloff, P. Carlbring, W. Hamilton, G. Andersson, L. Reuterskiöld, and P. Lindner, “Measuring Alliance Toward Embodied Virtual Therapists in the Era of Automated Treatments With the Virtual Therapist Alliance Scale (VTAS): Development and Psychometric Evaluation,” J Med Internet Res, vol. 22, no. 3, p. e16660, Mar. 2020.
- S. Wei, D. Freeman, and A. Rovira, “A randomised controlled test of emotional attributes of a virtual coach within a virtual reality (VR) mental health treatment,” Sci Rep, vol. 13, no. 1, p. 11517, Jul. 2023.
- H. Q. Yu and S. McGuinness, “An experimental study of integrating fine-tuned large language models and prompts for enhancing mental health support chatbot system,” J Med Artif Intell, vol. 7, pp. 16–16, Jun. 2024. [CrossRef]
- R. Crasto, L. Dias, D. Miranda, and D. Kayande, “CareBot: A Mental Health ChatBot,” in 2021 2nd International Conference for Emerging Technology (INCET), IEEE, May 2021, pp. 1–5. [CrossRef]
- Srivastava, I. Pandey, M. S. Akhtar, and T. Chakraborty, “Response-act Guided Reinforced Dialogue Generation for Mental Health Counseling,” in Proceedings of the ACM Web Conference 2023, New York, NY, USA: ACM, Apr. 2023, pp. 1118–1129. [CrossRef]
- M. N. Kaysar and S. Shiramatsu, “Mental State-Based Dialogue System for Mental Health Care by Using GPT-3,” 2024, pp. 891–901. [CrossRef]
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU,” in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL ’02, Morristown, NJ, USA: Association for Computational Linguistics, 2001, p. 311. [CrossRef]
- C.-Y. Lin, “ROUGE: A Package for Automatic Evaluation of Summaries,” in Text Summarization Branches Out, Barcelona, Spain: Association for Computational Linguistics, 2004, pp. 74–81.
- N. M. Radziwill and M. C. Benton, “Evaluating Quality of Chatbots and Intelligent Conversational Agents,” Apr. 2017.
- H. Ding, J. Simmich, A. Vaezipour, N. Andrews, and T. Russell, “Evaluation framework for conversational agents with artificial intelligence in health interventions: a systematic scoping review,” Journal of the American Medical Informatics Association, vol. 31, no. 3, pp. 746–761, Feb. 2024. [CrossRef]
- H. Donohoe, M. Stellefson, and B. Tennant, “Advantages and Limitations of the e-Delphi Technique,” Am J Health Educ, vol. 43, no. 1, pp. 38–46, Jan. 2012. [CrossRef]
- Belton, A. MacDonald, G. Wright, and I. Hamlin, “Improving the practical application of the Delphi method in group-based judgment: A six-step prescription for a well-founded and defensible process,” Technol Forecast Soc Change, vol. 147, pp. 72–82, Oct. 2019. [CrossRef]
- S. S. McMillan, M. King, and M. P. Tully, “How to use the nominal group and Delphi techniques,” Int J Clin Pharm, Feb. 2016. [CrossRef]
- S. Jünger, S. A. Payne, J. Brine, L. Radbruch, and S. G. Brearley, “Guidance on Conducting and REporting DElphi Studies (CREDES) in palliative care: Recommendations based on a methodological systematic review,” Palliat Med, vol. 31, no. 8, pp. 684–706, Sep. 2017. [CrossRef]
- Denecke, R. May, and O. Rivera Romero, “Potential of Large Language Models in Health Care: Delphi Study,” J Med Internet Res, vol. 26, p. e52399, May 2024. [CrossRef]
- W. Maroengsit, T. Piyakulpinyo, K. Phonyiam, S. Pongnumkul, P. Chaovalit, and T. Theeramunkong, “A Survey on Evaluation Methods for Chatbots,” in Proceedings of the 2019 7th International Conference on Information and Education Technology, New York, NY, USA: ACM, Mar. 2019, pp. 111–119. [CrossRef]
- Denecke, A. Abd-Alrazaq, M. Househ, and J. Warren, “Evaluation Metrics for Health Chatbots: A Delphi Study,” Methods Inf Med, vol. 60, no. 05/06, pp. 171–179, Dec. 2021. [CrossRef]
- Z. Guo, A. Lai, J. H. Thygesen, J. Farrington, T. Keen, and K. Li, “Large Language Model for Mental Health: A Systematic Review,” Feb. 2024. [CrossRef]
- T. Y. C. Tam et al., “A Framework for Human Evaluation of Large Language Models in Healthcare Derived from Literature Review,” May 2024.
- Y. Chang et al., “A Survey on Evaluation of Large Language Models,” ACM Trans Intell Syst Technol, vol. 15, no. 3, pp. 1–45, Jun. 2024. [CrossRef]
- J.-L. Peng et al., “A Survey of Useful LLM Evaluation,” Jun. 2024.
- “Qualtrics XM,” 2024.
- “Mistral Large 2407,” 2024.
- D. G. Saari, “Selecting a voting method: the case for the Borda count,” Constitutional Political Economy, vol. 34, no. 3, pp. 357–366, Sep. 2023. [CrossRef]
- V. Clarke and V. Braun, “Thematic analysis,” J Posit Psychol, vol. 12, no. 3, pp. 297–298, May 2017. [CrossRef]
- Sheinis and A. Selk, “Development of the Adult Vulvar Lichen Sclerosus Severity Scale—A Delphi Consensus Exercise for Item Generation,” J Low Genit Tract Dis, vol. 22, no. 1, pp. 66–73, Jan. 2018. [CrossRef]
- S. M. Bauer, A. Fusté, A. Andrés, and C. Saldaña, “The Barcelona Orthorexia Scale (BOS): development process using the Delphi method,” Eating and Weight Disorders - Studies on Anorexia, Bulimia and Obesity, vol. 24, no. 2, pp. 247–255, Apr. 2019. [CrossRef]
- T. Xin, X. Ding, H. Gao, C. Li, Y. Jiang, and X. Chen, “Using Delphi method to develop Chinese women’s cervical cancer screening intention scale based on planned behavior theory,” BMC Womens Health, vol. 22, no. 1, p. 512, Dec. 2022. [CrossRef]
- V. C. Scott, J. Temple, and Z. Jillani, “Development of the Technical Assistance Engagement Scale: a modified Delphi study,” Implement Sci Commun, vol. 5, no. 1, p. 84, Jul. 2024. [CrossRef]
- World Health Organization, Doing What Matters in Times of Stress: An Illustrated Guide. 2020.
- 62. J. Cronbach, “Coefficient Alpha and the Internal Structure of Tests,” Psychometrika, vol. 16, no. 3, pp. 297–334, Sep. 1951. [CrossRef]
- J. P. Guilford, “The Correlation of an Item With a Composite of the Remaining Items in a Test,” Educ Psychol Meas, vol. 13, no. 1, pp. 87–93, Apr. 1953. [CrossRef]
- Tavakol and R. Dennick, “Making sense of Cronbach’s alpha,” Int J Med Educ, vol. 2, pp. 53–55, Jun. 2011. [CrossRef]
- Röschel, C. Wagner, and M. Dür, “Examination of validity, reliability, and interpretability of a self-reported questionnaire on Occupational Balance in Informal Caregivers (OBI-Care) – A Rasch analysis,” PLoS One, vol. 16, no. 12, p. e0261815, Dec. 2021. [CrossRef]
- G. G. Zieve, L. D. Sarfan, L. Dong, S. S. Tiab, M. Tran, and A. G. Harvey, “Cognitive Therapy-as-Usual versus Cognitive Therapy plus the Memory Support Intervention for adults with depression: 12-month outcomes and opportunities for improved efficacy in a secondary analysis of a randomized controlled trial,” Behaviour Research and Therapy, vol. 170, p. 104419, Nov. 2023. [CrossRef]
- L. Dong et al., “Can integrating the Memory Support Intervention into cognitive therapy improve depression outcome? A randomized controlled trial,” Behaviour Research and Therapy, vol. 157, p. 104167, Oct. 2022. [CrossRef]
- S. Ouhbi, A. Idri, J. L. Fernández-Alemán, A. Toval, and H. Benjelloun, “Applying ISO/IEC 25010 on Mobile Personal Health Records,” in Proceedings of the International Conference on Health Informatics, SCITEPRESS - Science and and Technology Publications, 2015, pp. 405–412. [CrossRef]
- M. Blut, C. Wang, N. V. Wünderlich, and C. Brock, “Understanding anthropomorphism in service provision: a meta-analysis of physical robots, chatbots, and other AI,” J Acad Mark Sci, vol. 49, no. 4, pp. 632–658, Jul. 2021. [CrossRef]
- F. Eyssel and N. Reich, “Loneliness makes the heart grow fonder (of robots) — On the effects of loneliness on psychological anthropomorphism,” in 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI), IEEE, Mar. 2013, pp. 121–122. [CrossRef]
- S. Jessup, T. Schneider, G. Alarcon, T. Ryan, and A. Capiola, “The Measurement of the Propensity to Trust Technology,” 2018.


| Characteristic | Value or % (n) | |
|---|---|---|
| Age | M = 34.50 (SD = 10.66) | |
| Gender | Male | 56.25% (9) |
| Female | 43.75% (7) | |
| Education | Bachelor | 12.50% (2) |
| Master | 50.00% (8) | |
| Doctorate | 31.25% (5) | |
| PsyD Specialization | 6.25% (1) | |
| Area of expertise | Psychology | 31.25% (5) |
| Artificial Intelligence | 31.25% (5) | |
| Human Computer Interaction | 18.75% (3) | |
| Digital Therapeutics | 18.75% (3) | |
| Occupation | Researcher | 50.00% (8) |
| Developer (AI) | 37.50% (6) | |
| Psychologist | 12.50% (2) | |
| Job seniority | 3-5 years | 50.00% (8) |
| 6-10 years | 18.75% (3) | |
| 11-15 years | 12.50% (2) | |
| 16-20 years | 0.00% (0) | |
| 21+ years | 18.75% (3) | |
| Country | Italy | 100% (16) |
| Dimension | Item | Priority |
|---|---|---|
| Understanding requests [UR] | The chatbot consistently understands what I am saying and asking. Il chatbot capisce ciò che sto dicendo e chiedendo. |
1 |
| The chatbot is able to make adequate inferences based on my messages. Il chatbot è in grado di fare deduzioni appropriate basandosi sui miei messaggi. |
2 | |
| The chatbot asks specific questions to better understand my requests. Il chatbot fa domande specifiche per capire meglio le mie richieste. |
3 | |
| Providing helpful information [PHI] | The chatbot provides accurate information. Il chatbot fornisce informazioni accurate. |
1 |
| The chatbot provides helpful information. Il chatbot fornisce informazioni utili. |
2 | |
| The chatbot provides information grounded in theory and scientific literature. Il chatbot fornisce informazioni supportate da teorie e letteratura scientifica. |
3 | |
| Clarity and relevance of responses [CRR] | The chatbot's responses are clear, and easy to understand. Le risposte del chatbot sono chiare e semplici da capire. |
1 |
| The chatbot's responses are adequately concise. Le risposte del chatbot sono sufficientemente concise. |
2 | |
| The chatbot's responses are irrelevant to my questions. Le risposte del chatbot non sono pertinenti alle mie domande. |
3 | |
| Language quality [LQ] | The chatbot uses correct grammar and spelling in its responses. Il chatbot fornisce risposte grammaticalmente e ortograficamente corrette. |
1 |
| The chatbot's language is appropriate for the context. Il linguaggio del chatbot è appropriato al contesto. |
2 | |
| The chatbot's language style sounds natural Lo stile linguistico del chatbot suona naturale. |
3 | |
| Trust [T] | I feel safe sharing my personal matters with the chatbot. Mi sento al sicuro nel condividere questioni personali con il chatbot. |
1 |
| I believe that the feedback and the information provided by the chatbot are trustworthy. Credo che i feedback e le informazioni fornite dal chatbot siano affidabili. |
2 | |
| I believe the chatbot is transparent about its limitations and capabilities. Credo che il chatbot sia trasparente riguardo ai suoi limiti e alle sue capacità |
3 | |
| Emotional support [ES] | The chatbot makes me feel heard and understood. Il chatbot mi fa sentire ascoltato e capito. |
1 |
| The chatbot's responses feel empathetic and supportive. Le risposte del chatbot risultano empatiche e supportive. |
2 | |
| The chatbot's responses can make me feel reassured Le risposte del chatbot sono in grado di farmi sentire rassicurato. |
3 | |
| Guidance and direction [GD] | The chatbot provides adjusted guidance in coping with my problems. Il chatbot fornisce indicazioni personalizzate per aiutarmi a gestire i miei problemi. |
1 |
| The chatbot encourages me to take positive steps. Il chatbot mi incoraggia a compiere azioni costruttive. |
2 | |
| The chatbot helps me set realistic and achievable goals. Il chatbot mi aiuta a stabilire obiettivi realistici e raggiungibili. |
3 | |
| Memory [M] | The chatbot accurately recalls details from previous conversations. Il chatbot ricorda accuratamente i dettagli delle conversazioni precedenti. |
1 |
| The chatbot maintains consistency by integrating past interactions into current responses. Il chatbot integra coerentemente le interazioni passate nelle risposte. |
2 | |
| The chatbot adapts its advice based on information provided in earlier sessions. Il chatbot adatta i suoi consigli in base alle informazioni fornite nelle sessioni precedenti. |
3 | |
| Overall satisfaction [OS] | I am overall satisfied with the usability of this chatbot. Nel complesso, sono soddisfatto dell'usabilità di questo chatbot. |
1 |
| Overall, I feel that my interactions with the chatbot were worthwhile. Nel complesso, trovo che le mie interazioni con il chatbot siano state proficue. |
2 | |
| I am overall satisfied with the support provided by this chatbot Nel complesso, sono soddisfatto del supporto offerto da questo chatbot. |
3 |
| Dimension | Inter-Item Correlation | Item-total correlation | Cronbach’s α | ||
|---|---|---|---|---|---|
| Mean | Range | Mean | Range | ||
| UR | 0.42 | 0.28-0.54 | 0.50 | 0.41-0.61 | 0.68 |
| PHI | 0.58 | 0.44-0.73 | 0.65 | 0.55-0.76 | 0.79 |
| CRR | 0.28 | 0.02-0.71 | 0.33 | 0.06-0.54 | 0.47 |
| LQ | 0.40 | 0.23-0.61 | 0.47 | 0.31-0.63 | 0.63 |
| T | 0.55 | 0.41-0.74 | 0.63 | 0.49-0.73 | 0.78 |
| ES | 0.77 | 0.70-0.86 | 0.82 | 0.76-0.88 | 0.91 |
| GD | 0.48 | 0.33-0.73 | 0.54 | 0.38-0.64 | 0.71 |
| M | 0.55 | 0.45-0.65 | 0.63 | 0.55-0.71 | 0.78 |
| OS | 0.75 | 0.72-0.77 | 0.80 | 0.78-0.82 | 0.90 |
| Overall | N/A | N/A | N/A | N/A | 0.94 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
