How LangMon's sentiment analysis works

October 16, 2024

Version 2: October 2024 - present

In October several updates were made to the sentiment analysis process described below in version 1. The October evals are the first to use this version.

Split out the aspect extraction prompt from the sentiment analysis prompt

In version 1, the sentiment analysis prompt had two jobs: to provide a numeric sentiment score and to provide a list of positives and negatives extracted from the text. In version 2, we split this into two different prompts.

Calculate an average sentiment score

In version 2, the sentiment analysis prompt is run five times, and the scores are averaged to provide the final score. In tests, sentiment scores could vary between evals. Calculating an average may lead to much less variance in the scores over time, we shall see. Further, when available, the temperature parameter for the inference is set to 0, to reduce randomness in the LLM's output.

The aspect extraction prompt is run just once still.

Switch the LLM used for sentiment analysis and aspect extraction from Claude 3.5 Sonnet to GPT 4o Mini

With many more prompts being run, comes higher cost, and Claude 3.5 Sonnet is comparatively expensive. Further, tests showed that both Claude 3.5 Sonnet and GPT 4o Mini returned similar sentiment scores.

Version 1: July - September 2024

If LangMon itself is a proof of concept, then v1 is, well, very rough. There are two prompts run for each of the LLMs that LangMon monitors.

Prompt 1

The first prompt asks the LLM being tested for its opinion on the business. A longer version of "Hey, what's your opinion of this business? Try to be as objective as possible, considering both positive and negative aspects of the brand".

The initial version of this prompt was generated using Anthropic's fantastic Prompt Generator tool, which is helpful for taking an idea of a prompt and expanding on it in a way that LLMs are good at following. This prompt is evaluated using Promptfoo across all the models tracked. The results are then stored in a database.

Prompt 2

The second prompt takes the response from the first prompt as an input. An LLM (in this case Anthropic's Claude 3.5 Sonnet model) is asked to perform a sentiment analysis of the provided text. The prompt asks the LLM to follow these steps:

Carefully read and analyze the provided text.
Identify key phrases, words, or sentences that express sentiment towards {{companyName}} or {{companyName}} products.
Consider both explicit and implicit expressions of sentiment.
Take into account the context and tone of the text.
Evaluate the overall sentiment on a scale from -10 (extremely negative) to +10 (extremely positive), with 0 being neutral. The score should be an integer value from -10 and 10.

The LLM is asked to extract a list of positive and negatives from the text and provide a numeric sentiment score between -10 and +10. Using an LLM to run this analysis was a quick way to see results and establish a baseline. LLMs are showing to be quite effective for these types of natural language processing tasks. They can pick up on tone, context, negation, and many more features that would need to be added one-by-one to a more traditional sentiment analysis pipeline.

This approach has drawbacks, though, namely losing the ability to examine the inner-workings of the analysis. If the LLM changes, then its response to the prompt could change. Further, the response could be impacted by the parameters sent to the LLM such as temperature and logit bias.

The sentiment analysis prompt is run once, and the result is stored in the database.

Each business is evaluated once per month and the trailing 3 month scores for each are sent to the web app for display. The evals in July, August and September all used this version.