• Skip to primary navigation
  • Skip to main content

Mark Proctor

Mark Proctor's Website

  • Lean Product Development
  • Lean Software Development
  • Marketing
  • About
  • Experience
  • Contact
  • Show Search
Hide Search

Mark Proctor

How Do Large Language Models (LLMs) Work?

September 17, 2024 By Mark Proctor

How Do Large Language Models (LLMs) Work?

Large Language Models (LLMs) have taken centre stage in artificial intelligence (AI) applications

They power a wide range of services, from conversational agents like ChatGPT to advanced text analysis tools. But how do these models work? 

We need to break down the complex mechanisms and processes involved in training, optimising, and deploying these models and their architectural principles.

1) The Basics of Language Modeling

A language model is a statistical model that predicts the probability of a word or sequence of words given the context of preceding words. The goal is to understand and predict language patterns, enabling the model to generate text, answer questions, or engage in dialogue.

Language models range from simple n-gram models, which predict the next word based on the last “n” words, to more sophisticated models like recurrent neural networks (RNNs) and transformers, which analyse longer and more complex sequences of text. 

The transition to LLMs like GPT (Generative Pre-trained Transformer) and BERT (Bidirectional Encoder Representations from Transformers) marks a significant leap due to their ability to capture and understand intricate language patterns.

LLMs are built using deep learning techniques, particularly neural networks—a computing system designed to recognize patterns. These models learn the structure of language by ingesting vast amounts of data, then using this information to generate human-like text.

2) Transformer Architecture: The Foundation of LLMs

The foundation of LLMs like GPT-4, BERT, and others lies in the Transformer architecture, which was introduced by Vaswani et al. in their 2017 paper titled “Attention is All You Need.” 

Before transformers, RNNs and LSTMs (Long Short-Term Memory networks) were popular for language tasks. They had limitations particularly in handling long-range dependencies and parallelization during training.

3) Key Concepts in Transformers

– Self-Attention Mechanism: This is a central innovation of the transformer architecture.

Instead of processing sequences in order like RNNs, transformers analyse relationships between words in a sentence at once, allowing them to weigh the importance of each word relative to others, regardless of their position. This mechanism is crucial for understanding context in long passages.

– Positional Encoding: While transformers process all tokens simultaneously, they still need to understand the order of words. Positional encoding introduces this order by adding unique, learned vectors to the word embeddings.

– Multi-Head Attention: To capture different types of linguistic relationships, transformers use multiple “attention heads” to focus on various aspects of the input sequence simultaneously.

– Feedforward Neural Networks: Once the self-attention layer is completed, transformers use standard feedforward layers to further refine and interpret the information.

– Layer Normalisation: After each operation, transformers apply normalisation to maintain stable training and prevent degradation of gradients.

This architecture revolutionised NLP (Natural Language Processing) because it allowed models to handle far more complex tasks than before. It’s the backbone of most state-of-the-art LLMs, including the GPT (Generative Pre-trained Transformer) series from OpenAI.

3) Training Process and Datasets

Training a large language model is computationally intensive and requires massive datasets. LLMs are typically pre-trained on extensive corpora of text data, which may include books, websites, academic papers, and more. The training process can take weeks or months on powerful hardware (usually GPUs or TPUs), and it involves several stages:

Pre-training

In the pre-training phase, the model is trained to predict missing words in a sequence (a task called masked language modelling in BERT) or to predict the next word in a sequence (as in causal language modelling in GPT). This stage is unsupervised, meaning the model doesn’t require labelled data; it simply learns by trying to predict text based on patterns observed in the data.

During pre-training, the model adjusts its weights—a set of parameters that represent what the model has learned—to minimise the error in its predictions. These weights are updated iteratively using techniques like backpropagation and gradient descent, optimising the model over time.

Fine-tuning

After pre-training, LLMs are often fine-tuned on smaller, task-specific datasets using supervised learning. A model might be fine-tuned on question-answering data, summarization tasks, or sentiment analysis to make it more effective for specific applications.

Fine-tuning allows the model to adapt its general knowledge to particular tasks, making it more precise in areas where high accuracy is required.

4) How LLMs Generate Text

Once trained, LLMs can generate text in a manner that mimics human-like responses.

Input Tokenization

When a user inputs text, the model first converts the input into tokens. Tokens are small units of text, which could be individual words or even sub-word fragments. Tokenization is essential because it helps the model work with fixed vocabulary sizes and ensures that even out-of-vocabulary words can be represented through sub-word components.

Contextual Understanding

Next, the model applies its deep learning layers to analyse the context of the tokens. The self-attention mechanism helps the model understand relationships between different parts of the input, identifying which tokens are most relevant in generating the next word.

Text Generation

When generating text, the model predicts one token at a time, using probability distributions over the possible tokens. These probabilities are generated based on the input and the model’s learned knowledge. The model may then select the most probable token, or employ techniques like temperature scaling (which controls the randomness of the predictions) or beam search (which considers multiple potential sequences before selecting the most appropriate one).

This step-by-step prediction is repeated until the model generates the full sequence of text, resulting in coherent and contextually relevant sentences.

5) Practical Applications of LLMs

LLMs have a wide range of practical applications, spanning various industries:

– Conversational AI: LLMs are used to power chatbots and virtual assistants that can engage in natural conversations, handle customer queries, and provide recommendations.

– Content Creation: They can generate creative writing, including articles, stories, and poetry. LLMs are increasingly used in marketing to draft emails, blog posts, and social media content.

– Language Translation: Models like GPT and BERT can be fine-tuned for translation tasks, helping bridge language barriers.

– Code Generation: Codex, a variant of GPT-3, can generate computer code from natural language descriptions, assisting software developers.

– Text Summarization: LLMs can condense long documents or articles into shorter summaries while preserving the key points.

– Sentiment Analysis: Companies use LLMs to analyse customer reviews, social media posts, and other textual data to gauge sentiment and make business decisions.

6) Limitations and Future Directions

Despite their impressive capabilities, LLMs have several limitations:

– Bias and Fairness: LLMs often reflect biases present in their training data, which can result in biased or harmful outputs. Researchers are actively working on methods to mitigate bias and ensure that AI systems are fair and unbiased.

– Context Length: While LLMs can handle long sequences of text, they still struggle with maintaining coherence in very long conversations or documents. Efforts are being made to extend context windows and improve the models’ ability to handle longer-term dependencies.

– Data and Energy Costs: Training large language models requires massive amounts of data and computational power. This has raised concerns about the environmental impact of such models, as well as the accessibility of AI technologies to smaller organisations that may not have the resources to train them.

– Factual Accuracy: LLMs sometimes generate plausible but incorrect or nonsensical answers. This is because the models are not truly “understanding” the world—they are pattern-recognition systems that rely on statistical correlations in the data rather than actual knowledge.

7) The Future of LLMs

We can expect LLMs to become more efficient, accurate, and accessible. Areas of active development include:

– Smaller, more efficient models that deliver performance comparable to large models but with significantly lower resource requirements.

– Multimodal models, which integrate not just text but also images, audio, and video, enabling more comprehensive understanding and interaction.

– Real-time adaptability, where models can update their knowledge dynamically rather than being fixed after training.

Large language models have revolutionised the way machines understand and generate human language. Their underlying architecture; built on the transformer model, combined with massive datasets and sophisticated training techniques, allows LLMs to perform a wide range of tasks that were previously thought to be the exclusive domain of humans.

Challenges like bias, high computational costs, and occasional inaccuracies remain. As the field evolves, LLMs will likely become even more powerful and adaptable, opening new possibilities for AI-driven innovation across industries.

What Is Social Media Marketing?

March 7, 2023 By Mark Proctor

The way marketing works is evolving. Brands need to adapt to stay ahead. Social Media Marketing is here to stay

Social media is one of the most powerful ways to reach and engage with customers. Universally used by consumers and brands – social media is one of
the most effective channels to connect with your audience.

Influencer marketing continues to grow at double-digit rates because it’s effective.
People are tired of being interrupted by ads and trust recommendations from those they follow – expect to see more brands partner with influencers to promote their products

It costs four to ten times more to acquire a customer than to retain one. To keep your customers around, use social media to support, communicate and engage. A good social media presence will translate into a better relationship with your brand.

Social media marketing has matured over the 10 years to become an integral part of the marketing mix for large and small businesses. It can have a significant and on your bottom line and can be a powerful marketing tool.

Whether you are trying to reach a local audience or launching a brand nationwide, social media marketing should be considered as part of your marketing strategy.

The Four C’s of Social Media Marketing

Communities

The most important element of a social media marketing campaign is people: real people, not Facebook pages, or LinkedIn profiles.

Unlike other marketing channels; eg banner ads or email, social media marketing needs to be social.

Building your target community database is similar to building an email list with one important exception: your intent is to facilitate conversation that builds interaction – not broadcasting a single message to a list.

Conversations

Reciprocity is the currency of social networking. To realise ROI in social media marketing, you have to believe in social karma.

Conversations are happening all over social media. Millions of these conversations are about products and services.

Asa brand owner do not be afraid to talk, share, educate and ask questions. Look at social media as a way to humanise your brand. If consumers like see and like you, they will like your company.

Be authentic, social and friendly. People engage with 11.4 pieces of content before making a purchase. Give your audience value they can’t get anywhere else. Give them the incentive they need to click and buy!

Channels

Each social channel comes with a unique social etiquette and language for enabling social conversations.

LinkedIn and Twitter
A lengthy and informative blog post might work for LinkedIn, but that blog wont work on Twitter. You need to create a cut-down or visual representation of the same content.

Facebook
The Facebook audience responds best to visually engaging posts that are entertaining, informative, inspirational or rewarding. The best content types to use are photos, videos, quizzes, competitions.

Instagram
Use Instagram to inspire and engage your potential customers. You must commit to sharing a frequent stream of high quality, inspiring and engaging photos and
videos.

Campaigns

An advertising campaign is designed, measured and optimised across target audience segments, message variations and advertising channels. In the same way your social media campaign needs to be designed, measured and optimised across target communities, various topics of conversation, and social media channels.

The first step to creating a social media marketing strategy is to establish
your objectives and goals. Without goals, you have no way to measure your social media return on investment (ROI).

Knowing who your audience is and what they want to see on social is key to
creating content that they will engage with. This knowledge is also critical for planning how to develop your social media audience into customers for your business.

Creating audience personas – eg simplified categories of buyers of your product.
Each persona will have different demographics, buying motivations, common
buying objections, and emotional needs.

How To Prioritise New Product Features

July 20, 2021 By Mark Proctor

Value versus Complexity

With the Value vs Complexity model, you evaluate every opportunity based on its business value and relative complexity to implement.

Initiatives that have the highest value and the lowest effort will be the low-hanging fruit for your roadmap.

Weighted Scoring

Weighted Scoring starts with Value versus Complexity model, but layers in scoring to achieve an objective result. The result – a scoring card to rank our initiatives and features against benefit and cost categories.

We can customise the inputs that go into a product decision. Ultimately a scoring model can help the team have an objective conversation rather than a having a room full of opinions!

Kano Model

With Kano, we look at potential features on a scale of how much “delight” they would provide customers versus the investment required to improve a feature.

Basic features are those needed to sell your product. These “threshold” features are required – but continuing to invest in them may not improve customer delight dramatically.

Performance related features give you a proportionate increase in customer satisfaction as you invest in them.

Excitement features will yield a disproportionate increase in customer delight.

Opportunity Scoring

Opportunity Scoring is a type of Gap Analysis that comes from Outcome-Driven Innovation. We measure and rank opportunities based on their importance versus customer satisfaction.

We ask customers to score the importance of each feature and then also score how satisfied they are currently with that feature.

The opportunities are those features that are highly important yet customers gave a low satisfaction score.

Our Opportunity Scoring model gives us innovation ideas by evaluating our competitors and our current customer’s relative satisfaction with features.

Affinity Grouping

Affinity Grouping can be used to help categorise our product development efforts. Everyone brainstorms together and writes opportunities on sticky notes.

As a team, begin to group similar items together, name the groups.

Finally everyone ranks the groups in order of importance.

Story Mapping

Story Mapping is a great way to document the MVP by organising and prioritising user stories.

Create task-oriented story cards then group them into a workflow.

Arrange the cards in priority order for each group.

Finally draw a line across all the stories to divide them into releases/sprints.

The 4 Foundations Of Product Development Success

August 9, 2018 By Mark Proctor

Everyone wants success with their product – we crave it. What is the secret? There is no one secret but I believe there are 4 cornerstones to success. The good news is they make a nice acronym – PELT…

Process

Someone once said about the randomness of the process of finding love through traditional dating eg meeting someone special in a bar; ‘serendipity is the sign of an inefficient market’. Relying on serendipity and guesswork is also the sign of a product development process that will ultimately fail. Sure inspiration and gut instinct have a place in your product development process – but they cant be the process!

The two biggest secrets to product development success are predictability and repeatability. Is my process predictable? – or do I not know what will happen from one month to the next. We need predictability without it we cannot make the process repeatable. In the same way you have fortnightly sprints with Scrum you need a prioritised, iterative and time boxed product development process to achieve success. Your product development process simply has to be repeatable. No ifs, no buts.

You see there is no such thing as product requirements – only product guesses. Regardless of expertise or seniority – nobody knows for sure how users will react to a new feature. We need everyone in our organisation to understand our products evolution should be built on a solid disciplined process – with out it we will end up with unhappy team members and unhappy users.

Evolution vs Revolution

You have a product manager, a UX team, a development team. With a new product we can put the team to work and build something great. Version 1 of the software is not bad, Version 2 is good, Version 3 is very good, Version 4 might be the best version of the software it will ever be. Then what happens? We have expensive resource that we don’t want sitting around doing nothing. Lets launch version 5 with even more new features. The problem? There will be a point where more features doesn’t add any more value for the customer, in fact it may subtract value.

As a product manager you need to be a curator for the product. A curator at a museum chooses which pieces of Art should be included in the collection. They say “No” to many items. A building full of Art that hasn’t been carefully selected isn’t a museum, it’s a warehouse.

I have seen applications that were a “warehouse” of features in need of greater focus and “tasteful selection” for what should be included. My mission now is to find a better way to communicate the message that “less is more”. Jason Fried’s product maxim. “Our products do less than the competition”. Once your product has delivered the Revolution it now needs to enter the Evolution phase.

Linked to Business Goals

What does a low performance company look like? We know siloed teams are the biggest detriment to high performance. Equally we see another big challenge – especially as our organisation grows. Teams start to focus on whats good for the individual team – not the company as a whole. We need a way to tie department and company goals together.

Studies have shown that committing to a goal can help improve team performance. Setting challenging and specific goals can further enhance your teams engagement in attaining those goals. Google has used “Objectives and Key Results” (OKRs) to set ambitious goals and track progress since it was a 50 person company.

The main benefit here is to keep vision, goals and objectives always in front of the team. They will then know what’s exactly expected of them.

Most people fail in life not because they aim to high and miss – they fail because they aim to low and hit. Our aim is to set very ambitious goals. OKRs can enable teams to focus on the big bets and accomplish more than anyone thought possible.

Talent

Process is important but it will only get you so far. Attracting and retaining talent is any digital organisations biggest challenge. We are in a new era. The traditional office and compensation models are no longer appealing. Candidates want to know you value work life balance and individuality.

Ensure you have clarity about what your mission is. We don’t want a ‘job’ we want to work with team mates who share the same passions. Authentically express your mission and culture on relevant channels.

To stand out and attract top talent you must be able to articulate and share how your employee value proposition is lived every day, along with the vision and mission of your company. You have a brand for the external market – you also need an employee brand that can effectively communicate the employee experience.

How much of productivity uplift can your organisation get from top talent? A recent study of 600,000 researchers, entertainers, politicians, and athletes found high performers are 400 percent more productive than average ones. Two studies of businesses show similar results and reveal the gap rises with job complexity.

In highly complex occupations the effect of talent is even more astounding -high performers are a remarkable 800 percent more productive.

Be under no illusion; the relationship between quality of talent and business performance is dramatic…

 

 

 

 

 

 

The 3 Stages Of A Winning MVP

June 1, 2018 By Mark Proctor

CB Insights found in 100 failed startups that the number one cause of failure (42%) was ‘no market need’. So almost half of these startups spent time, effort and money building a product before they found out… they were wrong in their core assumption: users needed their product.

So how do you ensure you dont fall into this trap? Follow the full MVP best practise.

Listen: customer discovery

Is this worth working on, is there anyone out there who wants this?

Experiment: product discovery

How can we make this work? Will people use our solution to solve their problem?

Execute: product delivery

How can we build this efficiently? How can we ensure what we are building is of good enough quality?

Feature Volume vs Feature Delight

Jussi Pasanen’s MVP Pyramid Model. Contrast the 2 product pyramids below. Left we have “many features, none of them good”. On the right, we have the “fewer features, all of them delightful” approach.

Minimum Viable Product vs Minimum Delightful Product

Using a merely viable product is like visiting someone in an intensive care unit. They’re alive, but not fun to spend time with.

The challenge with an MVP eg build only what you need -is that you may validate the product – but in a hyper competitive environment – thats not enough. Delightful products are products users fall in love with. They immediately become part of a persons life or work. When a product is delightful it just makes sense. The product is just intuitive and your experience is highly satisfying. Delightful products are adopted faster, get better word of mouth, and create higher satisfaction.

Product Gestalt

The definition of gestalt – the whole is more than the sum of its parts. The product gestalt is the “soul” of the user experience. A combination of user experience and functionality that makes the product ‘just work’. A gestalt is the part of a product that remains constant over time:

  • The simple search box and results page on Google
  • The friends list and news feed on Facebook

A great gestalt is not as simple as accumulating the right features. Its the ‘secret sauce’ of the right elements working together in harmony. The end goal? Users stop thinking about the technology and simple achieve their goals.

Making your product meaningful

If a product isn’t meaningful for users, no amount of amazing design will make it successful.

Foursquare is a good example of this. Foursquare was well-designed app and pioneered many of the gamification mechanisms we still use today. While they grew quickly, users eventually lost interest, because most didn’t see the purpose of the app. Foursquare wasn’t helping them move towards pleasure or away from pain in a meaningful way.

Nir Eyal, author of the book Hooked: How to Build Habit-forming Products, states; a product should be designed to facilitate a users need but ultimately alleviate a symptom of a problem they have.

The three elements required for any effective behaviour change are: motivation, triggers, and ability.

In a marketing environment like a landing page, motivation can be seen as the emotional backdrop that creates the desire for the consumer to continue along the sales funnel.

This page from Tiffany & Co embodies the emotion involved in a couple’s engagement, providing ample motivation for the user to click through and progress through the funnel:

Fogg behavior model Tiffany

Incorporate these three things well into your product will create an optimal user experience. Deciphering a users motivation, their emotional state, and the behaviours they might exhibit that could point to a problem that needs solving are all important parts of design psychology and creating that illusive delightful user experience.

 

 

  • Go to page 1
  • Go to page 2
  • Go to page 3
  • Interim pages omitted …
  • Go to page 5
  • Go to Next Page »

Mark Proctor

Mark Proctor - Copyright © 2026 - Privacy