Large-language models are going to fundamentally change how we create and consume documents in an era where everybody will be getting information via chatbots.
I have to spend a lot of time reviewing information to try to stay abreast of current trends. News articles, written by journalists who have been trained in structure and storytelling, are usually a pleasure to read — and I’m guessing that they’re an excellent source for information querying.
But a lot of documents and web pages I have to review fall squarely into the category of “marketing mush”: SEO-optimized fluff where I struggle to understand what the product does, when I would use it, and why it’s different from the alternatives.
In the short term, the world is going to see a lot more “astroturf” content (Astroturf looks like grass, but it isn’t real).
There are lots of office jobs that involve a bored person being paid to write something they’re not really interested in that another bored person has to read as part of their job. It turns out that LLM technologies such as ChatGPT are perfect for creating and then summarizing the kinds of corporate texts that nobody wanted to read in the first place.
People who struggle to write coherent sentences are turning to these engines to help them churn out text. This can be for good reasons (if English isn’t your native language for example). But the worst examples will be like in the cartoon below, with office workers pointlessly padding out text while others use AI to strip it back down to the essentials.
This is now an actual feature of office suites, by the way! It was announced at GoogleIO, and Microsoft 365 Copilot is proposing something similar.
And unfortunately, this is going to apply to the web as a whole—the astroturf content created by AI (documents, web pages, dialogues etc) will be ingested into the next generation of large-language models, creating layer upon astroturf layer…
Looking to the future, what’s the point of documents? They’ll increasingly become “just” elements in databases of text. People will have to still have communicate knowledge by writing it down, but the form and formatting will become less relevant because it’s going to be consumed in the form of chatbot queries and intelligent summarization.
This has already happened in some areas — if you want to apply for a job, you had better make sure that your résumé is machine readable, or you will never get past the applicant tracking system.
But it’s quickly going to go to a whole new level. For example, I talked to Stripe a few weeks ago, and they have announced that they are enabling the use of OpenAI’s GPT4 on their documentation:
So where is this going to go?
After all, although astroturf isn’t real, it doesn’t mean it’s not useful. It has a lot of advantages over real grass in the right circumstances.
For things like documentation, it makes sense that text will be published in the form of short, consumable, discreet facts that lend themselves to easy summarization.
But what will become of all the marketing documents constructed around three-point bullet trees, especially if they’re combined with documents from different sources and other vendors? Will that structure be kept by the chatbot interface when summarizing? (and should it?)
How will this effect how people create documents that aren’t just about facts, such as marketing materials? What will happen to storytelling and emotions? Will it be stripped out by the summarization engines?