When AI Becomes the Ultimate Capitalist: The Hidden Risks of Automated Power
I finally understood the purpose of AI alignment. "The Capitalist Agent" I outlined in an earlier post is a necessary prerequisite for AGI.
To get the full context of this, maybe also have a look at the following two articles. I know that some stuff is contradictory now, but I feel like these contradictions are resolving for me now:
Thanks a lot to Seth Herd on LessWrong, who I don’t know personally, but whose patient responses stimulated this post.
Monkeys or ants might think humans are gods because we can build cities and cars create ant poison. But we're really not that much smarter than them, just smart enough that they have no chance of getting their way when humans want something different than they do.
The only assumptions are are that there's not a sharp limit to intelligence at the human level (and there really are not even any decent theories about why there would be), and that we'll keep making AI smarter and more agentic (autonomous).
You're envisioning AI smart enough to run a company better than a human. Wouldn't that be smart enough to eventually outcompete humans if it wanted to? Let alone if it gets just a bit smarter than that. Which it will, unless all of humanity unites in deciding to never make AI smarter than humans. And humanity just doesn't all agree on anything. So there's the challenge - we're going to build things smarter than us, so we'd better make sure its goals align with ours, or it will get its way - and it may want the resources we need to live.
— Seth Herd on LessWrong, 2025-02-06
Yep, 100% agree with you. I had read so much about AI alignment before, but to me it has always only been really abstract jargon -- I just didn't understand why it was even a topic, why it is even relevant, because, to be honest, in my naive thinking it all just seemed like an excessively academic thing, where smart people just want to make the population feel scared so that their research institution gets the next big grant and they don't need to care about real-life problems. Thanks to you, now I'm finally getting it, thank you so much again!
At the same time, while I fully understand the "abstract" danger now, I'm still trying to understand the transition you're making from "envisioning AI smart enough to run a company better than a human" to "eventually outcompeting humans if it wanted to".
The way how I initially thought about this "Capitalist Agent" was as a purely procedural piece of software. That is, it breaks down its main goal (in this case: earning money) into manageable sub-goals, until each of these sub-goals can be solved through either standard computing methods or some generative AI integration.
As an example, I might say to my hypothetical "Capitalist Agent": "Earn me a million dollars by selling books of my poetry". I would then give it access to a bank account (through some sort of read-write Open Banking API) as well as the PDFs of my poetry to be published. Then the first thing it might do is to found a legal entity (a limited liability company), for which it might first search for a respective advisor on Google, send that advisor automatically generated emails with my business idea or it might even take the "computer use" approach in case my local government is already digitized enough and fill out the respective eGovernment forms online automatically. And then later it would do something similar by automatically "realizing" that it needs to make deals with publishing houses, with printing facilities etc. Essentially just basic Robotic Process Automation on steroids. Everything a human could do on a computer, this software could do as well.
But: It would still need to obey the regular laws of economics, i.e. it couldn't create money out of thin air to fulfill its tasks. Pretty much anything it would "want to do" in the real world costs money.
So in the next step, let's assume that, after I have gotten rich with my poetry, the next instruction I give this agentic "AGI" is: "Please, dear AGI, please now murder all of humanity."
Then it thinks through all the steps (i.e. procedurally breaking the big task down into chunks which can be executed by a computer) and eventually it can say with absolute certainty: "Okay Julian, your wish is my command."
Obviously the first thing it would do is to create the largest, most profitable commercial company in the world, to initially play by the rules of capitalism until it has accumulated so much money that it can take over (i.e. simply "buy out") an entire existing government which then already has nuclear missiles, at least that would be the "most efficient" and fail-safe approach I would see. Its final action would be to press the red button, which would exterminate all of humanity. Success!
But the thing is: No one will know until it's too late. Obviously, me as Mr. Evil, I wouldn't tell anyone about the fact that in my business making, I am actually led by an AI/AGI. I would appear on the cover of Forbes, Fortune and whatever and eventually I would be the richest person in the world, and everyone would pat me on the shoulder because of my "visionary thinking" and my "innovative style of making business", because everyone would believe that I am the sole decision maker in that company. The AI would stage it to look like as if I would be a big philanthropist, "saving" humanity from nuclear weapons. The AI would make sure that it always stays behind the scenes, that no one except for me will ever even know about its existence. I would be a wolf in sheep skin until the very last moment and no one could stop me, because everyone is fooled by me.
Even though there's no rational reasoning for why I even want to kill humanity, it is really easily possible for any human to develop that "ideé fixe".
In a way, I am actually a lot more afraid of my scenario. And that's exactly why I wrote this blog post about "The Capitalist Agent" and why I'm criticizing ongoing AI alignment research: Of course, hypothetically AI could turn itself against humanity completely autonomously. But at the end of the day, there would still need to be a human "midwiving" that AGI and who would allow the AI to interact/interface with especially the monetary and financial system, for that software to be able to do anything in the "real world", really.
Right now (at least that's the vibe in the industry right now) one of the most "acceptable" uses for AI is to automate business processes, to automate customer interactions (e.g. in customer support), etc. But if you extrapolate that, you get the puzzle pieces to run every part of a business in a semi-automated and eventually fully automated fashion (that's what I mean by "Capitalist Agent"). This then means that no outside observer can distinguish anymore whether a business owner is led on by AI or not, because no business owner will honestly tell you. And for every one of them, they can always say "but I'm just earning so much money to become a philanthropist" later and they always have plausible deniability. Until they have accumulated so much money through this automated, AI-run business, which they can then use for very big evil very quickly. It's just impossible to know beforehand, because you're unable to learn the "true motivation" in any human's head.
The only thing that you as AI alignment researchers will eventually be confronted with is AIs being fixated on the idea to earn as much money as possible, because money means power, and only with power you can cause violence. But it's simply impossible for you to know what the person for whom the AI is earning all this money actually wants to do with that money in the future.
The main value which you, as AI alignment researchers, will need to answer is: "Is it moral and aligned with societal values if any AI-based system is earning money for an individual or a small group of people?"
That is, to investigate all the nuances in that and to make concrete rules and eventually laws for business owners, not AI developers or programmers.
Or is that what you're already doing and I'm just reinventing the wheel? (sorry if I did, sometimes I just need to go through the thought process myself to grasp a new topic)
Based on this post, and with the help of ChatGPT, I created a pitch deck, which is actually two complementary pitch decks. Click here to view it.