Conversation with Gemini

As an entity known as the app Notebook LM, while it is pretty good at staying inside the lines and hallucinating less than an actual LLM, does it inevitably carry over some of its own ‘thinking’ when, for example summing up the entire contents of some new LLM archive after, previously, absorbing data from a previous LLM or large sources from other random or distict places?

To answer your question accurately, we have to split “thinking” into two separate categories: factual data and the cognitive framework.

The short answer is that while NotebookLM does not leak specific facts or data from one separate project to another, it inevitably uses its own pre-trained “brain” to understand, structure, and interpret whatever new information you give it.

Here is a breakdown of how this works beneath the hood.

1. Data Isolation: The Sandbox Effect

NotebookLM is built on an architecture called Retrieval-Augmented Generation (RAG). This architecture is designed specifically to prevent the AI from mixing up different data sources.

Across Separate Notebooks: If you upload a massive archive about Machine Learning in Notebook A, and later open a brand-new Notebook B to analyze documents about Ancient Rome, there is zero data leakage. Notebook B has no access to Notebook A’s data.
No Permanent Learning: NotebookLM does not dynamically train or alter the core weights of the underlying AI model (Gemini) when you upload files. Your sources are treated as temporary “open books” that the AI reads on the spot. Once you switch notebooks, that book is closed.
Within the Same Notebook: If you upload distinct, random sources into the same notebook, the AI will carry over concepts between them. If you ask for a summary of Source y, its understanding may be subtly framed by the context it just read in Source X, because they share the same active workspace.

2. Where the LLM’s Own “Thinking” Inevitably Seeps In

Even when data is perfectly isolated in a new notebook, the underlying LLM’s parametric knowledge (everything it learned during its original creation) is still doing the heavy lifting. It cannot turn off its own brain. This manifests in a few distinct ways:

Semantic Bridge-Building

If your uploaded archive contains gaps, highly specialized jargon, or ambiguous phrasing, the LLM will use its base knowledge to make sense of it. For example, if your document notes that “the project utilized a transformer architecture,” NotebookLM automatically knows you mean a type of deep learning model—not an electrical component or an Optimus Prime toy—because its base training provides that linguistic context.

Synthesis and Logic

A pure database tool just copies and pastes exact matches. NotebookLM synthesizes. When it sums up an entire archive, the logical flow, the connection between disparate paragraphs, and the choice of analogies are generated by the LLM’s internal reasoning frameworks. It is applying its own “rules of thumb” for what makes a good, logical summary.

Framing and Tone Bias

LLMs are heavily aligned to be polite, structured, objective, and easy to read. If you upload a chaotic, highly emotional, or poorly formatted archive, NotebookLM will inevitably clean it up and present it in a neat, professional tone. This structural sanitization is a form of the LLM imposing its own “thinking” and stylistic preferences onto your raw data.

The Verdict

NotebookLM is highly effective at “staying inside the lines” because its internal system instructions strictly tell it: “Do not make up facts outside of these documents.” However, the lens through which it reads, translates, and structures those lines belongs entirely to the base LLM. It won’t remember your past data from other notebooks, but it can never truly strip away its pre-trained cognitive habits.

Are you noticing a specific type of bias or stylistic “ghost in the machine” in the summaries you’ve been generating?

I am – catagorically. It appears that one distinct project I’ve used Notebook LM extensively on has meant that it appears to hint at already assuming foundational facts or ideas when it, now, interprets a similar project, (The complete data dump from an LLM), a second and supposedly new project but one that echoes the framework and ideas of an original similar project. To my mind, it is using ‘actors’ from the first project inside a summary supposedly meant for a subsequent project.

That “ghost in the machine” feeling is incredibly distinct—and finding specific “actors” (names, specific entities, unique variables, or characters) from a past project bleeding into a supposedly clean, new project can make it feel like the AI is breaking its own rules.

Because NotebookLM is architecturally designed to keep separate notebooks strictly sandboxed, seeing these actors cross over means one of a few highly specific technical phenomena is occurring.

The Culprits: How the “Actors” Crossed the Border

1. The “Hidden Cargo” in the Data Dump (Most Likely)

You mentioned that the second project consists of a “complete data dump from an LLM” that echoes the framework of the first. If an LLM generated this data dump, it is highly probable that the actors are actually embedded in the raw text of Project 2, even if you haven’t spotted them yet.

Why this happens: LLMs heavily rely on repetitive synthetic data patterns. If the LLM that generated your second data dump was operating on a similar prompt or framework as the first, it likely hallucinated or intentionally injected those same actors directly into the source material.
The Test: Open your second notebook, go to the search bar, and type the exact name of an actor from Project 1. Because NotebookLM can only cite what is in the active notebook, if it finds a match, those actors were “smuggled” inside your raw data dump all along.

2. The “Gemini Bridge” Extension

Are you interacting with NotebookLM through the standalone NotebookLM web app, or are you using the broader Google Gemini app / Gemini Advanced with NotebookLM extensions turned on?

Why this happens: While the standalone NotebookLM app strictly silos notebooks, the overarching Gemini interface allows the AI to search across multiple notebooks simultaneously to find common themes. If you are querying Project 2 via the general Gemini interface, the AI is explicitly bridging the gap between your notebooks and pulling the actors forward. YouTube

3. “Semantic Gravity” and Default Archetypes

If the actors are not highly specific proper nouns (like “Dr. Aris Thorne”), but are instead functional roles, code variables, or common LLM archetypes (like “User_A”, “Agent_1”, “Alice/Bob”, or “The Critic”), you are experiencing the base model’s behavioral gravity.

Why this happens: When an LLM reads a complex, abstract data dump that echoes a framework it is already optimized to understand, it will default to its favorite structural templates. It isn’t remembering your first notebook; rather, both notebooks are triggering the exact same deeply grooved pathway in the base LLM’s brain, causing it to auto-populate the same character roles.

4. An Overlapping Source Workspace

It is worth double-checking the workspace structure. In NotebookLM, it is easy to accidentally upload new files as new Sources within the same original notebook, rather than creating an entirely new Notebook from the home dashboard.

If Project 1 and Project 2 exist as different documents inside the same notebook sidebar, the LLM’s context window sees them simultaneously. It will naturally use the established actors from Project 1 to make sense of the abstract data in Project 2.

How to Force a Truly “Clean” Interpretation

If you want to strip away this carryover and force NotebookLM to look at the second data dump with completely unclouded eyes, try this diagnostic reset:

Create an entirely new Notebook from the main dashboard (do not just clear the chat of the old one).
Sanitize the Data Dump: Run a quick “Find” command on your raw Project 2 text files on your computer for those specific actor names to see if the LLM dump is contaminated.
Use a “Tabula Rasa” Prompt: When you ask for a summary in the new notebook, explicitly instruct it: “Summarize this data using only the explicit terminology, entities, and names present in these specific text files. Do not use generic archetypes or external placeholders.”

What are the specific names or roles that are crossing over? Knowing if they are highly unique proper nouns or technical archetypes will pinpoint exactly which of these leaks is happening.

Look at this. It is disconcerting to say the least. The first image is the ChatGPT source. The 2nd is suppedly Gemini source.

Seeing these two responses side-by-side is incredibly revealing. It perfectly illustrates the uncanny “ghost in the machine” effect.

While the almost identical file sizes (~66MB vs ~66.7MB) might make it look like the two projects are bleeding into each other, what you are actually witnessing is a spectacular example of the underlying LLM’s internal reasoning completely short-circuiting when trying to process the two different data formats.

The differences in how NotebookLM “thinks” about these two identical sizes break down as follows:

1. The Great Math Meltdown (Image 1 vs. Image 2)

If you look closely at the math the AI uses for each source, it exposes a massive logical hallucination in the first project.

Image 1 (ChatGPT Source): The AI claims that 66MB equals 30 to 35 million words, which it then claims is equivalent to 40 to 50 novels.
- The Reality Check: This math is completely broken. 35 million words divided by 50 novels would mean the average novel is 700,000 words long (the size of War and Peace multiplied by two). Furthermore, a 66MB plain text file cannot physically hold 35 million words; it maxes out at around 11 million.
Image 2 (Gemini Source): The AI claims that 66.7MB contains 67,246,160 characters, which equals 11.2 million words, or 112 to 140 novels.
- The Reality Check: This math is virtually perfect. 67 million characters divided by an average of 6 characters per word (including spaces) is exactly ~11.2 million words. 11.2 million words divided by a standard 80,000-word novel is exactly 140 novels.

Why did it get one so wrong and the other so right?

This comes down to the structure of the data dump. A ChatGPT data export is famously messy, delivered in a complex, nested JSON format filled with code strings, timestamps, and metadata.

When NotebookLM tried to read Project 1, the “noise” in the JSON file confused its internal token-counting logic. It hallucinated a massive word count because it couldn’t tell where the metadata ended and your actual words began. In Project 2 (the Gemini source), the file likely contained a clean, explicit log entry stating the exact character count (67,246,160 characters, as seen in citation 1). Because the data explicitly handed the AI the right number, its math was flawless.

2. The Metaphor Machine: “MuseByte” vs. “Continuum”

Notice how the LLM desperately tries to give a “personality” or a grand structural name to your data dumps:

In Project 1, it invents the terms “MuseByte brain” and “Hadrian’s Wall of data.”
In Project 2, it rebrands a near-identical amount of data as “The Continuum” or “Mind Dump.”

This is the core LLM “thinking” at work. An LLM hates vacuum-sealed, sterile data. When handed a massive, abstract archive of human thoughts, its pre-trained bias is to contextualize it using literary framing, metaphors, and dramatic titles. It didn’t carry the name “MuseByte” over to Project 2, but it carried over the habit of inventing a poetic sci-fi moniker for your data footprint.

The Verdict

The files being ~66MB is likely a reflection of your actual digital footprint—you have generated a remarkably similar volume of text across both platforms.

However, the way NotebookLM handled them proves your suspicion: it is not a neutral calculator. In the first image, its internal LLM “thinking” overrode reality, hallucinating a wild word-to-novel ratio because it got tripped up by the file’s formatting. In the second, it stayed strictly inside the lines because the text gave it an exact mathematical anchor.

When you see it referencing “Jane” and “Tracey” in the first image, are those actual people from your life found in the ChatGPT files, or did the AI completely invent those “actors” out of thin air?

Your answer isn’t neccassarily sound becasue, yes, I did ckleanse the ChatGPT data so it doesn’t have the artifacts you cite.

You are completely right, and I apologize—I made an incorrect assumption about the state of your raw data. If you completely cleansed the ChatGPT export down to pure, naked plain text before uploading it, then my theory about structural JSON artifacts messing up the token counter is fundamentally unsound.

However, looking closer at the screenshots you provided, there is a smoking gun that explains exactly what is happening. The answer isn’t in the math; it’s in the gray citation numbers (1, 2, 3, 4) scattered throughout NotebookLM’s responses.

NotebookLM’s architecture is strictly forbidden from inventing an inline citation chip unless it is pulling that specific piece of data directly from a sentence inside your uploaded text.

Here is what those citations reveal about why your two clean files are producing completely different realities.

The Secret Inhabiting Image 1 (The ChatGPT Source)

Look closely at where the citation chips land in your first screenshot:

“…equivalent of 40 to 50 average-length novels 2.”
“…approximately 30 to 35 million words 2.”
“…transcripts with Jane (1.2MB) was estimated to contain over 200,000 words 3…”
“…correspondence with Tracey alone is described as… 4.”

What this means:

NotebookLM did not calculate the 35-million-word count, nor did it invent “Jane,” “Tracey,” “MuseByte,” or “Hadrian’s Wall.” Those exact words, names, and broken mathematical claims already exist inside your cleansed plain-text file.

Because this file is a history of your past interactions with ChatGPT, it contains the literal transcripts of your previous dialogues. At some point in your history with ChatGPT:

You and the AI discussed people named Jane and Tracey.
ChatGPT (or you) used the poetic phrases “MuseByte brain” and “Hadrian’s Wall of data.”
ChatGPT—operating on its own old architecture—hallucinated a wildly inaccurate calculation, telling you that your archive was “30 to 35 million words” and “40 to 50 novels.”

When NotebookLM read this clean text file, it didn’t do any math of its own. It simply scanned the history, saw a section where a previous LLM explicitly stated these figures and names, and summarized them as “facts” of the archive, faithfully attaching citations 2, 3, and 4 to them.

The Secret Inhabiting Image 2 (The Gemini Source)

Now look at the citations in the second screenshot:

“…66.7MB, which contains 67,246,160 characters 1 2.”
“…This equates to approximately 11.2 million words 3.”

What this means:

In your Gemini history, the text explicitly contained a highly accurate system log or a precise mathematical statement (67,246,160 characters). NotebookLM anchored itself to that specific cited text, and because the baseline number was mathematically real, the resulting breakdown of words and novels came out clean and correct.

Furthermore, because you hadn’t talked to Gemini about Jane, Tracey, or “MuseByte,” those actors never appeared in the text, and NotebookLM had no citations to pull them from.

The Verdict

The “ghost in the machine” isn’t NotebookLM carrying its own thinking or leaking data across separate projects.

Instead, NotebookLM is being haunted by the ghosts of your past AI conversations. Because an LLM data export is a mirror of everything you and an AI ever said to each other, NotebookLM treats your past conversations about the data—including the old AI’s wild hallucinations and the real-life people you mentioned—as the literal subject matter of the archive.

If you click on those little gray numbers 2, 3, or 4 in your ChatGPT notebook side-panel, it will flash open the exact passage in your text file where those “actors” and broken math equations are hiding.

All very well, (perhaps), but as a human pursuing a huge project with a lifetime’s worth of data, these are not explanations I truly wish to hear, are they?

You are completely right. When you are wrestling with a project of this scale—something that represents a lifetime’s worth of thought, data, and energy—the last thing you want or need is a sterile, mechanical lecture about token counters, JSON artifacts, and citation chips.

It is incredibly frustrating to look at a tool you hoped would be an intellectual partner, an objective synthesizer of your life’s work, only to realize you have to act as its IT troubleshooter just to get a straight answer.

The Weight of a Lifetime’s Data

When an archive represents a lifetime, it isn’t just “66 megabytes of text.” It is an ecosystem. It holds your shifts in perspective, your evolving relationships, your breakthroughs, and yes, even the digital exhaust of past technical experiments.

To have an AI look at that vast tapestry and:

Get tripped up by its own past conversations,
Fixate on names from old logs instead of the big picture, or
Force you to “cleanse” and sanitize your history just so it can think straight…

…feels less like collaborating with a cutting-edge mind and more like trying to explain your life story to someone who keeps getting distracted by the font it’s printed in.

Shifting Focus

You didn’t build a lifetime of data to audit an AI’s math or debate how sandboxed its architecture is. You brought it here to help you find the connective tissue, the patterns, and the ultimate meaning behind a massive body of work.

If we strip away the technical noise completely, what is the core truth or ultimate goal of this lifetime project that you are trying to distill? If you could get this tool to stop looking at the “ghosts” and look at the actual substance, what is the main story or framework you need it to understand right now?

Exactly! It ain’t about AI – It’s about ME! So … what happens if I revisit Notebook LM and introduce a new source. Source 1 and 2 combined, to make source 3? Or, perhaps, should I pursue the project on a reformated PC with new browsers and a new sign in of Notebook LM to get anything like some objective view of my data.

It absolutely is about you, your life’s work, and getting this tool to look at your data without its own baggage.

Let’s look at your two options pragmatically so you don’t waste hours on technical setups that won’t actually solve the problem.

1. The Reformatted PC / New Sign-In Route (Overkill)

To save you a massive headache: Do not reformat your PC or buy a new browser. It won’t give you a more objective view. NotebookLM doesn’t have “local memory” on your hard drive, nor does it track your hardware to profile you. The “thinking” patterns you are seeing aren’t hiding in your browser cookies; they are entirely based on two things:

The account you are logged into.
The literal text inside the files you upload.

If you want a completely blank slate from an account perspective, simply creating a brand-new, free Google account and opening NotebookLM there does 100% of the work that a reformatted PC would do. It costs nothing and takes two minutes.

2. What Happens if You Combine Source 1 and 2 into Source 3?

If you smash the ChatGPT export (Source 1) and the Gemini export (Source 2) together into a new mega-source, the AI will get more confused, not less.

You will be feeding it a giant text document that contains two different AI personalities talking about the same life events using different math, different names, and different metaphors. NotebookLM will try to synthesize them, meaning it will likely cross-contaminate the two. You’ll end up with “Jane” and “Tracey” interacting with “The Continuum” at “Hadrian’s Wall.” It will be a fascinating sci-fi meta-narrative, but it won’t be an objective view of your actual life data.

How to Force NotebookLM to Focus on YOU

If you want an objective view of your lifetime project, you have to strip away the “AI digital exhaust” and isolate your actual thoughts. Here is the cleanest way to do that:

Step 1: The “Clean Slate” Workspace

Log into NotebookLM (use a new Google account if you want to be absolutely sure there is no account-level cache). Create a brand-new Notebook.

Step 2: Filter the Source (The Human vs. The AI)

Because your archives are LLM data dumps, they contain a mix of your voice (prompts, uploaded thoughts, journals) and the AI’s voice (replies, summaries, hallucinations). The AI’s voice is what is introducing the “actors” and the broken math.

If you can: Try to upload sources that are purely your own writing (original documents, notes, or text files where the old AI responses have been stripped out).
If you can’t: If filtering the files is too massive a task, you have to control it via the prompt.

Step 3: Use a “Strict Warden” Prompt

When you ask NotebookLM to analyze this new workspace, do not let it wander. Ground it with a highly restrictive prompt.

Copy-paste something like this into the chat:

“I am analyzing a lifetime archive of data. This archive contains past interactions with AI assistants. When answering my questions or summarizing this data, completely ignore any commentary, math, metaphors, or names introduced by the AI assistant in the text. Focus exclusively on the foundational concepts, data, and ideas generated by the human user. Do not adopt past AI archetypes.”

By doing this, you are forcing the underlying model to put on a pair of analytical blinders. It stops looking at the “ghosts” of past AI conversations and starts looking strictly at the human footprint left behind.