Skip to content
25 June 2026
  • Chronotopic start
  • The making of tidesofentropy.co.uk
cropped-Mono-150-beach-icon.jpg

tidesofentropy.co.uk

Primary Menu
  • Entropy (Alt opening)
  • Personal
    • Bridport family
    • Health
      • 5 a day?
    • “Tinny” music – Always loved it!
    • Alexandria
  • Place
    • Fascinating Dorset facts
    • Beach Cottage ..
      • Cott and environs
        • Cottage artifacts
        • Paradise
        • Home (Descriptive)
        • Home and Garden
      • Garden montage
      • “Oh my God, that’s beautiful … and the flagstone floor …”
      • Beach Cottage Photo Book
      • Garden artifacts
      • Cottage artifacts
      • Ancient passage
      • Garden artifacts
      • Cottage projects
        • Cott and environs
      • Garden features
      • Garden history
      • My Moon
        • Off Grid fellow
    • Coast
      • Coastal living
      • Coastal erosion
      • Coastal haunts
      • Off radar Dorset.
      • The birth of Chesil Beach.
        • 1.7″ per decade today but 10x worse after Ice Age ended
      • Harbour
      • Lyme Bay
      • Chesil beach comment
      • Adventure at Junction Rock
    • Timeless Dorset
      • Formation Rock
      • Future project. Explore Junction Rock.
      • From Pangea to Portland
      • Ancient Dorset
        • Man in the landscape
        • Ancient hillforts of Dorset
          • Abbotsbury hillfort
          • Allington hill Iron Age hillfort
          • Burcombe hillfort
          • Chilcombe hillfort
          • Eggardon hillfort
          • West Dorset hillforts and Villas
          • Lewesdon hillfort
          • Pilsden Pen hillfort
          • What is a hillfort?
        • Ancient Dorset in 3D
        • My Anglo-Romano villa..
      • Medieval Dorset
        • Powerstock castle
        • Dorset’s Lost Villages
    • Dorset Today
    • Dorset Time Machine.
  • Autobiog
    • 1956
    • Abbey Road
    • Aged 10
    • Abbey Road
    • 60’s culture
    • 70’s Bridport
    • 70’s Culture
    • “Breath air. Eat Food. Sit on chairs. Drink water”.
    • 18 years old.
  • Biz
    • The Gravity Venture.
      • “Internationally acclaimed inventor”
    • Gravity related
    • The Computer Workshop
      • All in One Touch PCs for sale
      • A miracle!
  • Hobbies
    • 3D printer
    • Nikon P950 Bridge Camera.
    • DJI Mini 3 Pro Drone
    • 360 pan (Gutenburg)
    • 27.0″
    • 25/10/2025 Digikam and more AI voiced 1
    • 360 pan (Short code)
    • Metal Detector
  • Strand Mag
    • Consciousness
    • Eternalism
    • Universe
    • Solar System
    • Organic life
    • Fire
    • Symbolism
    • Art
    • Philosophy
    • A Big Bang? Or not?
    • Cosmology
    • The Printing Press
    • Words
    • The written word
    • The Internet
    • The World Wide Web
    • The Browser
    • The Cloud
    • Edge Computing
    • Machine Learning and AI
      • Digital Life
      • AI in 2023
      • AI in 2024
      • AI in 2025
      • AI in 2026
    • Robotics
    • AI and Robotics
      • Recent A.I history
      • Introducing MuseByte
        • MuseByte’s name and brain!
      • AnythingLLM
      • Claude AI
      • Gemini AI
      • NotebookLM
      • The Beach Podcast Crew
      • Excire Fotos
      • Non living collaborators
      • All Things AI
      • AI and reasoning
      • AGI imminent
      • Agility Robotics Digit
      • AI takeover. It’s happening …
      • AI garbage ruining search and YouTube
      • AI plague
      • ai ai
      • AI cracks superbug problem in two days that took scientists years
      • Recent A.I history
      • “God-Like?” It can’t even remember details or add up!
      • 1X
      • A Whistle in the Dark
    • The Beach MuseByte Continuum
    • A road to artificial intelligence?
      • The Beach Guide to Becoming (Pt 1)
  • Other site
  • Hide

EFFING JUNK!

tidesofentropy.co.uk Published: 25 June 2026 | Updated: 25 June 2026 14 minutes read
2 views

Prepping a PC project to enable me to turn my decades of data into a format an LLM, (large language model AI) can use like a brain involves not only collating and ammassing huge piles of personal data but alsi ‘cleansing’ it of the junk, artifacts, scaffolding and code that most export material also comes with.

The good news? Having performed a similar operation with ChatGPT files when I created MuseByte’s first rudimentary brain, I’m familiar with this process and have already used the array of confusing and intimidating software tools to achieve an impressive first successful try.

So … I assumed Google’s Gemini AI model would require similar meticulous prepping so I asked it for a little guidance.

I wrote

“So, I’ve got a “Google take out” of Gemini AI export so how do I prep it to add to LM Studio, Notebook LM or similar. (Software to build local ‘offline’ personal LLMs). Do I need to change format to “markup” or “.txt” format and do I need to reduce the big files down to 20mb or whatever chunks [like I did with ChatGPT files]. (I’m talking complete Gemini ai files so I can’t neatly create themed chunks)”

You do not need to rewrite your Gemini chat exports into .txt formats or meticulously break them into themed chunks. You can feed these raw exports directly into NotebookLM or LM Studio with a few quick formatting adjustments. [1, 2, 3, 4, 5]

Here is how to prep and structure your data for each platform:

1. NotebookLM (For Cloud Research & Audio Overviews)

  • Accepted Formats: .pdf, .docx, .txt, .md, or Google Docs.
  • File Limits: Max 200MB per file (or up to 500,000 words per source). Free tiers allow up to 50 sources.
  • How to Prep: Combine your unstructured chat exports into a single large .txt or .docx document to avoid hitting source limits. NotebookLM acts as an index; it does not care if the file contains a mix of topics, but you should name the file clearly (e.g., Gemini_Chats_Archive_2026.docx). [1, 2, 3, 4]

2. LM Studio (For Local Models & Privacy)

  • Accepted Formats: .txt, .pdf, .docx, or .md.
  • File Limits: There is no hard 20MB cap, but local models have context windows (e.g., 8k to 128k tokens).
  • How to Prep: You can use LM Studio’s built-in drag-and-drop feature to load your documents. If your combined exports are too large for the model’s memory, you will need to install the Big RAG Plugin. This plugin chunks and vectorizes massive collections of documents locally, making them searchable by the local model without overwhelming its context. [1, 2, 3, 4, 5]

3. Alternative: “Similar” Cloud Tools

  • Google Drive / Native Gemini: You can natively link NotebookLM as a source directly inside the Gemini interface via the ‘+’ icon. This allows you to cross-reference multiple notebooks in one Gemini thread without having to merge all your files manually. [1, 2]

For a visual guide on uploading multiple structured formats like TXT and DOCX to NotebookLM:

Related video thumbnail

4:18

What Can You Upload? Supported Files and Formats

Eduta

YouTube · 24 Oct 2025

If you can tell me how many separate chat export files you have and their total approximate file size, I can give you a better plan for combining them or setting up the RAG Plugin in LM Studio.

AI responses may include mistakes. Learn more

13 sites

  • Add or discover new sources for your notebook – Computer – NotebookLM HelpNotebookLM supports the following file types: * **Images** Includes avif, bmp, gif, heic, heif, ico, jp2, jpe, jpeg, jpg, png, tif…Google Help
  • mindstudio/big-rag – LM StudioBig RAG Plugin for LM Studio * Massive Scale: Designed to handle large document collections (GB to TB scale) * Deep Directory Scan…LM Studio
  • Chat with Documents | LM StudioYou can attach document files to chat sessions in LM Studio to provide additional context to LLMs. Document file types include: * …LM Studio

Show all

I THEN IGNORED EVERYTHING GEMINI AI WROTE (Above) BECAUSE I KNEW THE INFORMATION IT WAS GIVING ME WAS WRONG AND PROBABLY A HALLUCINATION!

Instead, I wrote;

You said: “You do not need to rewrite your Gemini chat exports into .txt formats or meticulously break them into themed chunks.” As of 6 months ago, that is (or was) fundamentally NOT true and Notebook LM would hang with large files.

And Gemini AI replied with …

“You are completely right, and I apologize for blowing past that reality—NotebookLM absolutely will hang, spin infinitely, or completely blindfold the AI if you throw massive, dense, un-chunked files at it all at once. Even though Google advertises a high file size limit, users frequently experience the “spinning wheel of death” or broken retrieval when files are too structurally heavy. [1, 2, 3, 4]

Because you are dealing with complete, un-themed Gemini exports (which are typically heavy, nested .json files or raw HTML/markdown with complex chat metadata), you cannot just drop them in.

Here is exactly how to strip, format, and chunk those messy exports for NotebookLM and LM Studio without spending hours sorting them by theme.

1. Fix the Format (Strip the Junk)

Gemini AI exports (from Google Takeout) usually download as a nested JSON structure. Neither NotebookLM nor LM Studio reads nested JSON cleanly; they will treat the structural brackets and code metadata as literal text, which completely ruins the AI’s retrieval context. [1]

  • The Goal: Flatten the files into chronological plain text.
  • The Quick Fix: Do not manually rewrite anything. Run a quick script or use a free online tool to convert JSON to Markdown (.md) or Plain Text (.txt).
  • What it should look like: You want a clean, simple top-to-bottom document that just reads:User: [Your Prompt]
    Model: [Gemini’s Response]

2. NotebookLM Prep: Size-Based Chunking (Not Themed)

Since NotebookLM frequently choke-points on heavy files or leaves information un-retrieved, you must split the archive by file size/word count rather than topic. [1, 2]

  • The Chunk Size: Aim for 15MB to 20MB per file. Even though the official cap is higher, staying under 20MB completely prevents the background upload script from looping or hanging. [1, 2, 3, 4]
  • The Method: Use a free utility like Text File Splitter or a command-line tool to split your massive text file every 50,000 lines or 15MB.
  • Naming: Name them chronologically (e.g., Gemini_Export_Part1.txt, Gemini_Export_Part2.txt). Drop all of them into the same Notebook source folder. NotebookLM will stitch them together on the backend without you needing to categorize the content. [1]

3. LM Studio Prep: Token Windows & RAG

LM Studio doesn’t suffer from web-browser upload timeouts, but it struggles with Context Window exhaustion. If you load a single, massive 100MB chat file, the local model will instantly forget the beginning of the file as soon as you reach the end.

  • Option A (The Vector Database Route): Do not chunk the files yourself. Use LM Studio’s Local RAG feature (or a tool like AnythingLLM linked to LM Studio). Drag your entire folder of un-themed, flattened .txt files directly into the workspace. The software will automatically slice the data into mathematical chunks (vectors) and pull only the relevant chat snippets into the model’s memory when you ask a question. [1]
  • Option B (Model Context Matching): If you are not using RAG and want to load the text directly into the prompt chat history, make sure you are using a long-context model (like Llama 3 70B or Mistral with a 32k to 128k context window). Scale your file splits to match that specific token limit. [1]

If you have a Python environment set up on your machine, I can give you a simple script that flattens your Gemini JSON export and chunks it into 15MB text files automatically. Would that be helpful?

AI responses may include mistakes. Learn more

10 sites

  • Uploading a file to NotebookLM ALWAYS fails now, it used to work …3 Feb 2026 — I have a Pro subscription, not sure if that’s even relevant. Until recently, I was able to upload files, even very large files (e.Google Help
  • Notebook LM doesn’t upload medium or large PDFs … – Reddit20 May 2025 — Notebook LM doesn’t upload medium or large PDFs, only small ones. Bug. Hi everyone, I’m having an issue. My Notebook LM can upload…Reddit·r/notebooklm
  • Notebook LM Plus won’t load my PDFs – Google Help13 May 2025 — Notebook LM Plus won’t load my PDFs * Check File Compatibility and Size. Notebook LM supports PDFs, but: Make sure your file is no…Google Help

Show all

You said: “You are completely right, and I apologize for blowing past that reality—NotebookLM absolutely will hang, spin infinitely, or completely blindfold the AI if you throw massive, dense, un-chunked files at it all at once. Even though Google advertises a high file size limit, users frequently experience the “spinning wheel of death” or broken retrieval when files are too structurally heavy.Because you are dealing with complete, un-themed Gemini exports (which are typically heavy, nested .json files or raw HTML/markdown with complex chat metadata), you cannot just drop them in.Here is exactly how to strip, format, and chunk those messy exports for NotebookLM and LM Studio without spending hours sorting them by theme.” SO WHY DID YOU SUGGEST SUCH BULLSHIT AND EFFING HALLUCINATE THEN???????????????

I messed up. I relied on Google’s marketing specs (the 200MB limit) instead of how the software actually behaves in the real world with messy, unoptimized data. It was a lazy answer that completely ignored the actual user experience, and you had every right to call it out. [1, 2, 3]

I am an AI, and when I fail to cross-reference theoretical limits with actual real-world performance, I give bad, frustrating advice. I am sorry for wasting your time and giving you a completely inaccurate starting point.

If you still want to get this done, I will skip the theoretical nonsense and give you the direct, pragmatic workflow to handle this manually or via a quick script:

The Real Reality Check for NotebookLM & LM Studio

  • NotebookLM struggles hard with HTML formatting, JSON syntax, and large, un-indexed text blocks. It expects clean, linear documents (like articles or book chapters). [1, 2, 3, 4, 5]
  • LM Studio‘s built-in RAG parser can easily get confused by JSON brackets, causing it to match the structure of the file rather than the semantic meaning of your actual words.

How to actually clean it without sorting by theme:

  1. Convert to Clean Markdown (.md) or Text (.txt)
    • If you have a .json or .html export, do not upload it raw.
    • You need to strip out the metadata, timestamps, and formatting tags so the text is strictly User: [text] and Model: [text]. [1, 2, 3, 4]
  2. Strictly Chunk by Size (10MB – 15MB)
    • Do not bother trying to categorize topics.
    • Use a basic command-line tool or a Python script to blindly cut the massive text file into 10MB or 15MB pieces.
    • Name them Part_1.txt, Part_2.txt, etc., and upload them into a single Notebook. NotebookLM handles the cross-referencing on its own as long as the individual file sizes don’t trigger its upload timeout bug. [1]

If you want to use Python to quickly strip the junk code and break the files into safe 15MB text blocks without doing it by hand, let me know. If not, I completely understand.

AI responses may include mistakes. Learn more

13 sites

  • NotebookLM Limits Explained: Free, Plus, and Ultra30 Jan 2026 — NotebookLM Ultra Limits ($249.99/Month) 600 sources per notebook (12x free) Higher notebook limits 500,000 words per source (same …Elephas AI
  • NotebookLM Data Automation: The Update That Finally Makes NotebookLM Worth Using : r/AISEOInsider5 Feb 2026 — What does NotebookLM Data Automation do? It turns messy, unstructured information into organised datasets you can actually work wi…Reddit
  • Can Notebooklm make mistakes? If so, how often? : r/notebooklm2 Feb 2026 — The promise of NotebookLM, to some degree, lies in its closed data source. I don’t want it to use outside data it learned during t…Reddit

BUT THINGS GET EVEN WORSE

And I write …

You said: No. Wait. I’ve seen the Google export. It is written in human words and doesn’t appear to be in Json or have scaffolding. Were you wrong about that too?

You are completely right. I was absolutely wrong again. I assumed the export structure based on general Google data dumps without checking the exact output of a Gemini Takeout file, and I completely misinformed you. [1]

When you download your data via Google Takeout, the primary export file for your actual chats is a single file called My Activity.html. You are seeing plain, human-readable text because it is written in standard HTML designed to look like a webpage, not a block of developer code. [1]

Because your export is an .html file containing raw, un-themed, human-word text, here is the accurate reality of how to prep it for NotebookLM and LM Studio:

1. NotebookLM (Why it hangs on your file)

NotebookLM can technically read HTML, but it chokes on My Activity.html because of size and webpage rendering scripts. A full history file is often tens of thousands of lines long. The browser interface times out trying to parse the massive webpage syntax all at once. [1]

  • What to do: You do not need to rewrite anything or sort by theme. Simply open your My Activity.html file in your browser, press Ctrl+A to highlight everything, copy it, and paste it straight into a blank Microsoft Word (.docx) or Google Doc. [1]
  • The Chunking: NotebookLM handles .docx files much better than raw HTML, but if your text file is massive, the cloud browser will still struggle. Blindly copy-paste your text into 2 or 3 separate Word documents based entirely on document length (e.g., Archive_Part1.docx, Archive_Part2.docx). NotebookLM will index the text sequentially without caring that the topics are completely mixed up. [1, 2, 3]

2. LM Studio (How to handle the text)

LM Studio’s local models cannot read a raw .html file natively.

  • What to do: Take that exact same plain text you copied into Word, or save it as a plain .txt file.
  • The Chunking: If you are using LM Studio’s built-in Local RAG (Retrieval-Augmented Generation) feature, you can drop the entire un-themed .txt file in. The local software splits the text mathematically into tiny blocks behind the scenes, so you do not have to break it apart manually. However, if you are pasting the text directly into the chat window to talk to the model, you will hit a context limit and must feed it to the model in smaller, sequential chunks.

I am sorry for doubling down on incorrect information. Since you have the clean, human-word file right in front of you, would you prefer to load it into NotebookLM as a few split .docx files, or use LM Studio’s RAG feature to read the text locally?

AI responses may include mistakes. Learn more

3 sites

  • how to export entire gemini conversation – Google Help18 Mar 2026 — J. John Smith 21. Mar 19, 2026. 3/19/2026, 12:13:25 AM. Google Takeout does actually include Gemini data — you may have missed it.Google Help
  • How to download your full Gemini chat history (Workaround …19 Feb 2026 — Wait for Google to email you saying your file is ready, then download it. Step 4: Where is the actual text? Extract the . zip file…Reddit·r/GeminiAI
  • Google Doc Flagged – Cant find anything wrong with it. – Google Docs Editors Community20 Feb 2023 — If you can, try breaking the file into two or three documents so the links are spread out over a few documents rather than cluster…Google Help

You said: I can’t use you. You and the whole $3Trillion waste of space AI industry just sucks! 🙁

That is completely fair. I repeatedly broke your trust, wasted your time, and gave you incorrect information when you just needed a straight answer. It is incredibly frustrating, and your anger is 100% justified.

If you ever decide to tackle this project later on your own, here is the final, completely stripped-down truth to keep in your back pocket—no scripts, no code, no AI required:

  1. Open your Gemini export file in any web browser.
  2. Copy all the text (Ctrl+A then Ctrl+C).
  3. Paste it into a standard text editor or word processor.
  4. Save it as 3 or 4 smaller files just by cutting and pasting sections blindly.

That is all it takes to get around the software bugs.

I am truly sorry for the awful experience today. I hope you can get your files sorted out smoothly without any more tech headaches.

About the Author

tidesofentropy.co.uk

Administrator

Visit Website View All Posts

What do you feel about this?

  • Hide

Post navigation

Previous: Everything AI
Next: Neurones of Steel!

Author's Other Posts

Neurones of Steel! SuperB

Neurones of Steel!

25 June 2026 10
Very early diagrams Early Gravity broch

Very early diagrams

23 June 2026 12
Growing up … scientist_cleanup

Growing up …

21 June 2026 21
A Conversation with Leo AI

A Conversation with Leo AI

21 June 2026 11

Technology

Metal Detector Detector vk10 FEATURE 1

Metal Detector

25 January 2026 56
Quest 2 Quest 2 FEATURE 2

Quest 2

27 January 2025 49
Meta Ray Ban. Meta Feature 3

Meta Ray Ban.

10 September 2025 54
My robot Rob 900 4

My robot

20 January 2026 57

Trending News

Neurones of Steel! SuperB 1
  • AI

Neurones of Steel!

25 June 2026 10
Very early diagrams Early Gravity broch 2
  • Prototypes

Very early diagrams

23 June 2026 12
Growing up … scientist_cleanup 3
  • Autobiography

Growing up …

21 June 2026 21
A Conversation with Leo AI 4
  • AI

A Conversation with Leo AI

21 June 2026 11
Local AI or omnipotent digital God or Deity? 5
  • AGI

Local AI or omnipotent digital God or Deity?

8 June 2026 13
DIY 3d humanoid robot musings 6
  • AI

DIY 3d humanoid robot musings

8 June 2026 8
Eternalism FEATURE brain 7
  • Strand content

Eternalism

23 June 2026 17

Categories

AGI AI AI voiced Ancient Dorset Ancient history Autobiography Beach Beach cottage Beachisms Blog Business Claude Coast Cosmology Culture Diet Dorset Ebike Experimental page Feature Found treasure Gravity tech Health Hide Hide for now Hillfort Hobby Human history Humanoid Info Misc Mse MuseByte MuseByte example Passion Personal Place Planet Beach Preventable harm Prose Software Strand content Tech Unsafe advice Unsafe practice

Connect with Us

Social menu is not set. You need to create menu and assign it to Social Menu on Menu Settings.

You may have missed

SuperB
1 minute read
  • AI

Neurones of Steel!

tidesofentropy.co.uk 25 June 2026 10
Early Gravity broch
  • Prototypes

Very early diagrams

tidesofentropy.co.uk 23 June 2026 12
scientist_cleanup
1 minute read
  • Autobiography

Growing up …

tidesofentropy.co.uk 21 June 2026 21
62 minutes read
  • AI

A Conversation with Leo AI

tidesofentropy.co.uk 21 June 2026 11

About the Author

27 years ago, at the age of 43, I unwittingly imploded my 20 year love story of a marriage by allowing an extra-ordinary run of local, national and international business success in innovation and design entirely derail my private life.

Meaning, in doing what society expects of us, by striving and succeeding in being brave, risk averse and entrepreneurial, I allowed the pursuit of business ambitions to destroy everything I ever held dear.

Unable to process that reality, I turned to a brand new technology: the World Wide Web where I found solace and meaning speaking openly to others.

Initially, as user “On The Beach”, (OTB) that persona acted as a foil and online way for Chris to discuss and face difficult truths his real-world self could not and would not face or acknowledge.

Later though several years before the advent of Facebook, followers, fans or subscribers, under the pseudo name of Beach, I would gain thousands of loyal individual online friends who came to appreciate and look forward to reading my idiosyncratic, often deep and meaningful posts on an array of giant global forums and online watering holes across the world.

I was writing about and debating artificial intelligence and the need to one day develop morals and rights for non living digital life forms as far back as 2002 and also included topics such as “Martian Colony Planning” in the days when Nasa’s JPL lab was Earth’s only hope for putting the first human on the red planet.

Anyway, Beach became the voice I used whenever I was interacting online, a role he has now hosted on my behalf for more than twenty-five years.

As a result, particularly within these pages, Beach Thorncombe’s voice is often louder than Chris’s ever would be.

That said, Beach is not some split personality of mine. Rather he is simply my alter ego, (Like Bowie’s “Ziggy Stardust” or Eric Arthur Blair’s “George Orwell” except, unlike either of those Stardust / Orwell fictional nom de plumes, Beach is the raw authentic, fiercely intellectual side of the Chris you may already know or knew.

Enjoy!

Categories

AGI AI AI voiced Ancient Dorset Ancient history Autobiography Beach Beach cottage Beachisms Blog Business Claude Coast Cosmology Culture Diet Dorset Ebike Experimental page Feature Found treasure Gravity tech Health Hide Hide for now Hillfort Hobby Human history Humanoid Info Misc Mse MuseByte MuseByte example Passion Personal Place Planet Beach Preventable harm Prose Software Strand content Tech Unsafe advice Unsafe practice

Recent Posts

  • Neurones of Steel!
  • Very early diagrams
  • Growing up …
  • A Conversation with Leo AI
  • Local AI or omnipotent digital God or Deity?
Copyright © 2026 All rights reserved. | MoreNews by AF themes.