Analyzing Lessons Learned with ChatGPT - Pitfalls and Limitations

A perspective on using AI for lessons learned analysis.

Across industries, professionals are under pressure to make sense of ever-growing volumes of lessons learned. For many, the lessons are captured in spreadsheets that stretch to hundreds of entries. Faced with this challenge, it is tempting to paste the entire dataset into a tool such as ChatGPT and ask it to “summarize the lessons and provide recommendations.”

At first glance, this may appear to be a quick and cost-free solution. But in practice, this approach introduces significant problems in terms of quality, security, and data integrity.


1. Data Quality and Confidentiality Risks

The first challenge is with the data itself. Most spreadsheets of lessons include a proportion of low-quality entries – lessons without a clear problem, root cause or recommendation. The lessons may not have been reviewed and are low quality or even duplicates of other lessons. These provide little analytical value, but when dumped into ChatGPT they influence the output, distorting results.

More seriously, many lessons contain sensitive information: internal project issues, client project details (potentially subject to dispute), or proprietary processes. Entering this data into unsanctioned AI tools risks breaching confidentiality agreements or exposing corporate knowledge. For many enterprises, this is a clear compliance red line but Excel databases of lessons learned are nearly impossible to control.


2. The Technical Limits of Large Prompts

Large Language Models (LLMs) such as ChatGPT operate within a fixed context window — the maximum amount of text they can consider at one time. While these limits are growing, pasting hundreds of lessons often exceeds this boundary. The result is that:

  • Some lessons are truncated or ignored.
  • The model may focus on the most recent or longest entries rather than giving balanced attention.
  • Subtle but important patterns can be lost.

Even when the text fits, ChatGPT is being asked to turn a long unstructured text into a shorter, meaningful output but AI perception is different from Human Perception and LLMs will not evaluate a theme the way a human analyst or clustering algorithm would. The AI has no inherent way to distinguish between themes, recurring issues or isolated outliers and may give the same attention to a one-off outlier as a systemic issue.

This is because an LLM, like ChatGPT, will work with statistical patterns in the text and it will pay disproportionate attention to features such as length, tone and emphasis. This introduces two common distortions:

  • Recency bias – the attention mechanism favors recent tokens near the end of the context window. This means that if the last 20 lessons are about schedule delays then that would be considered the dominant theme, even if 80 earlier lessons were about supplier quality.
  • Repetition / Length Bias – longer lessons have more tokens and along with strongly worded lessons (e.g. “Critical safety failure”) they will be considered over frequent issues with less emphasis. The result is that rare lessons with strong emphasis dominate over common, systemic issues with understated language.

When a Large Language Model is given hundreds of lessons in a prompt, it does not perform a structured thematic analysis. Instead, it processes the text as a flat sequence and generates a compressed summary which leads to further issues:

  • Generic outputs – without clear thematic boundaries, the model tends to fall back on broad, catch-all recommendations such as “improve communication” or “enhance documentation”.
  • Inappropriate merging – lessons that are unrelated in practice may be pulled together under the same heading, producing summaries that feel vague or inconsistent.

A Project Manager, loading lessons into ChatGPT, is hoping that the tool will analyze trends and provide insights but the results aren’t satisfactory and attempts to tweak and improve the output are hampered by the probabilistic nature of the LLM as explored below.


3. Variability and Lack of Repeatability

Large Language Models are inherently probabilistic: the same input can produce different outputs from one run to the next. When hundreds of lessons are submitted in a single prompt, this variability becomes more pronounced:

  • Inconsistent summaries – two identical requests may generate different groupings, priorities or recommendations.
  • Shifting emphasis – themes highlighted in one run may be absent or minimised in another, creating uncertainty about what really matters.
  • Reduced reliability – outputs that change unpredictably are difficult to trust in professional settings where lessons must inform decisions and planning.

For lessons learned to inform governance, risk reduction and planning, the outputs must be defensible and when the outputs change unpredictably or cannot be regenerated then an organization cannot rely upon then with confidence.


A Better Approach

We apply a combination of machine learning and AI techniques implemented specifically for lessons learned analysis.

Lessonflow ensures that lessons are pre-processed, scored and filtered before entering our semantic workflow, this means that quality lessons are grouped using advanced techniques that identify similarity in meaning, not just wording.

A full analysis of themes and topics is performed with multiple steps prior to a generative AI providing focused summaries and recommended actions. This results in:

  • Clear, actionable outputs – no more generic ChatGPT text, practical summaries and recommendations can be actioned by teams
  • Consistency and trust – probabilistic output and distorted interpretation is replaced with consistent and sensible actions by applying generative AI in the most appropriate way.
  • Enterprise-level Security – sensitive data remains within a controlled environment, rather than being loaded into unsanctioned AI tools.

Conclusion

Using ChatGPT directly on a spreadsheet of lessons may provide a quick experiment, but it is neither reliable nor secure. For organizations that take KM and performance improvement seriously, it introduces more risk than value.

By applying structured machine learning and AI methods, it is possible to achieve analysis that is not only more accurate and actionable, but also compliant with the standards expected in modern organizations. The outcome is a process that delivers real insights while protecting the integrity of the data.

Explore how structured AI analysis can transform lessons into real organizational learning.

Ready to learn more?

Download our guide to explore the essential elements of an effective lessons learned program and see how your current approach stacks up against best practices.