How Much Real Work Can AI Agents Actually Handle? The Reality of BPO and AI in the Philippines

How Much Real Work Can AI Agents Actually Do? — What the Agents' Last Exam Reveals, and a BPO Strategy for the Philippines

Drawing on the latest test results — where even the world's most advanced AI passed only 24% of professional tasks — this article explains the realistic approach Japanese companies should take to the Philippine BPO sector and AI-agent adoption.

Part 1: Why This Matters

Step 1: The Philippine Business Context (3 min)

"Even cutting-edge AI can see real, professional work all the way through only about once in every four attempts." That is the reality revealed by the results of a new exam released by UC Berkeley (the University of California, Berkeley). The exam is called the "Agents' Last Exam (ALE)" and measures whether AI can perform valuable professional work in place of a human.

The Philippines is one of the world's leading hubs for BPO (business process outsourcing — the industry of taking on companies' operations from the outside). A great deal of work, from call centers to accounting shared services and IT help desks, flows into this country from companies overseas. That is precisely why the question of "how far AI can replace knowledge work" bears directly on the future of the Philippine economy.

These results matter, too, for Japanese companies expanding into the Philippines and for Japanese people working there. Pinning excessive hopes on AI and trying to replace staff all at once is likely to fail. Used well, on the other hand, AI can raise the quality of work for every individual local staff member. This exam gives you material for thinking about that "just-right distance."

Suppose you are a Japanese manager at a Manila office. Head office has asked you, "Can't we halve the accounting team with AI agents?" You broach the subject with your Filipino IT lead like this: "I'd like you to look at the exam results Berkeley released last week. Even the smartest AI today passed only 24% of real professional tasks. So instead of 'handing everything over,' let's decide together 'which tasks to have it help with.'" That single remark becomes the starting point for a realistic plan.

Step 2: Key Points from the Source Article (5 min)

Drawing only on the facts from the source article, we have summarized the main points in a table.

Item	Details
Name of the exam	Agents' Last Exam (abbreviated ALE)
Organization that created it	The RDI (Center for Responsible, Decentralized Intelligence) at UC Berkeley (the University of California, Berkeley)
Experts involved	An advisory committee of more than 300 experts across various fields
Purpose of the exam	To measure whether AI can perform economically valuable, long-horizon professional work
1st place	OpenAI's GPT-5.5 (running within the Codex harness), pass rate 24.0%
3rd place	Anthropic's Claude Fable 5 (a new Mythos-class model, tested the day after release), pass rate 22.0%
Result at the hardest stage	Many of the latest models posted a harsh 0.0% pass rate
Overall takeaway	Even the world's most advanced AI cannot see most real professional work through to completion

VentureBeat — "Surprise upset: GPT-5.5 beats Claude Fable 5 on brutal new Agents' Last Exam benchmark" (June 10, 2026)

This table was created for educational purposes from facts in publicly available information. Please consult the original article linked above for details.

Related: see How AI Agents Help Philippine Businesses Automate Complex Tasks for a detailed discussion.

Step 3: Comprehension Check (5 min)

Q1. Which university's research organization created this new exam, the "Agents' Last Exam (ALE)"?

Hint: It's a state university on the US West Coast, famous for computer science.

Q2. What was the name of the AI model that came in first, and what was its pass rate?

Hint: It's an OpenAI model, with a pass rate in the low 20% range.

Q3. What place did Anthropic's Claude Fable 5 take, and what was its pass rate?

Hint: The gap to first place was just 2 percentage points in pass rate.

Q4. What kind of work is this exam trying to measure?

Hint: Not a one-off quiz, but valuable professional work that requires a long sequence of steps.

Q5. At the hardest stage, what pass rate did many of the latest models record?

Hint: It's a figure you could describe as "barely able to make any headway."

Related: see How AI Agents Help Philippine SMEs Build a Digital Workforce for a detailed discussion.

Part 2: Putting It Into Practice

Step 4: Implementation Steps in the Philippines (10 min)

What these results convey is not "trust AI wholesale and replace people," but "have it assist people's work and raise the quality." We have summarized in a table the steps for bringing this mindset into a Philippine workplace.

Step	What to Do	Things to Watch For in the Philippines
1. Separate out the work	Split your company's work into "short, routine tasks" and "long, professional work"	At BPO sites, work is often broken down finely, which makes it a good fit for this kind of separation
2. Start small	First try AI as an aid with a single team	Start with a small budget on the order of a few thousand to tens of thousands of pesos a month, then expand after seeing the results — that's the safe way (always estimate the amount in-house)
3. Check the data	Decide what information may and may not be given to the AI	If you handle personal information, you need to check the Data Privacy Act (RA 10173), which is overseen by the NPC (National Privacy Commission)
4. Human review	Build a flow where a person always checks the AI's output	In workplaces where the culture of "saving the boss's face" persists, mistakes can be hard to point out. Make review an explicitly assigned role
5. Record the results	Record how much time was saved and use it to inform the next decision	In a field where verbal agreements are common, putting decisions in writing prevents later misunderstandings

What matters at each step is to drop the premise of "handing everything over." Even the smartest AI today has a pass rate of 24%. That is exactly why you should build in a mechanism for people to make the final check from the very start.

Step 5: Common Mistakes and How to Avoid Them (5 min)

Here are three common failures that come up when adopting AI agents in the Philippines.

Pitfall 1: Assuming "AI can handle everything"

What not to do: On head office's instructions, you halve the accounting team first and then bring in AI. But the AI dropped steps midway through long workflows, and the month-end close stopped working.

What to do instead: First, without cutting staff, you trial AI as an aid and spend three months gauging which tasks it can take on. You moved only the tasks where you confirmed results over to AI, little by little.

Pitfall 2: Starting to use it without deciding how to handle data

What not to do: You try to streamline work by feeding customers' personal information straight into the AI, with no clear in-house rules. Later you discover a risk of running afoul of the Data Privacy Act and have to redo the work entirely.

What to do instead: Before starting, you decide the scope of information that may be given to the AI. You convert personal information into a masked form before use and run things in line with NPC rules.

Pitfall 3: Skipping the explanation to local staff

What not to do: Japanese managers decide on AI adoption by themselves and tell local staff only "you'll be using this tool starting tomorrow." Staff grow anxious that "my job is being taken away," and the tool goes unused on the ground.

What to do instead: Before adoption, you hold a briefing for staff and carefully convey the purpose: "not to cut staff, but to make the work easier." You also set aside time to take questions and get buy-in before starting.

Part 3: Going Deeper

An AI agent (autonomous AI) is an AI that works toward a goal by figuring out the steps on its own, without a person directing it one instruction at a time. At a Manila BPO site, for example, you might have it handle a whole flow: reading the content of an inquiry email, looking up past records, and even drafting a reply.

The Agents' Last Exam (ALE) is a test designed to measure whether AI agents can see real professional work through to the end. When a call-center operations company compares new AI tools, looking at results from tests like this gives it a basis for judging real ability without being swayed by marketing claims.

Long-horizon workflows refer to the kind of work that is completed only after carrying out many steps in sequence. In a Manila accounting shared-services operation, this would be the connected chain of tasks from voucher verification to journal entry to the month-end close — an area where AI tends to drop steps midway.

A benchmark is a common yardstick that compares multiple AIs under the same conditions to measure, on a level footing, how well each performs. Staff choosing AI tools in the Philippines find it easier to explain decisions internally when they base them on the numbers from such a common yardstick.

A harness (the execution framework that runs an AI) is the mechanism that connects the AI model itself to actual work and makes it run. In this test, even the same model scored differently depending on which framework it ran within, so when choosing a tool locally you need to look at both "the AI inside" and "the framework that runs it."

Step 7: Applying This to Your Own Company (10 min)

Separate the parts of your work "AI can handle" from the parts "people must own"

These results show that AI cannot do everything. Start by breaking your company's work down finely and sorting it into short, routine tasks and long, professional work.

Something to think about: If you use the criterion "if the AI forgot one step midway, who would notice it and where?," the parts people should own start to come into view.

Design AI adoption at BPO sites as a "lift," not a "replacement"

Bringing in AI as a tool to cut staff tends to invite pushback on the ground. Try switching to the idea of easing each person's work and freeing up their time for higher-value work.

Something to think about: If, after adoption, you check "what new work staff have become able to do thanks to AI," you can tell whether you're achieving a lift.

Build an in-house evaluation process so you aren't swayed by AI-performance hype

AI marketing lines up high numbers, but the pass rate on real work this time was just 24%. When deciding on adoption at your company, build the habit of checking the numbers from a common benchmark.

Something to think about: Simply checking, every time, "on what test, and by whom, was this number measured?" raises the quality of your decisions.

Next action: At next week's team meeting, pick one piece of your company's work and write it out on paper, sorted into "short, routine tasks" and "long, professional work." It becomes the starting point for thinking about AI adoption.

Part 4: FAQ

Q1. At a Philippine BPO site, can AI agents really replace staff?

Replacing them entirely right now isn't realistic. In this test, even cutting-edge AI had a pass rate of 24% on real professional work. The longer the workflow, the more AI drops things midway. For the time being, the use that suits it is having it assist staff and raise the quality of their work.

Q2. When using AI, is there anything to watch for under Philippine data-related laws?

Caution is needed when handling personal information. The Philippines has a Data Privacy Act (RA 10173), overseen by the NPC (National Privacy Commission). Before feeding customers' personal information straight into the AI, set in-house rules for how it is handled and check with your legal team as needed.

Q3. I want to start small. What kind of budget can I begin with?

We recommend starting small with a single team. Beginning with a small budget on the order of a few thousand to tens of thousands of pesos a month, then expanding after seeing the results, is the safe approach. Since the specific amount varies with the tool and how much you use it, always estimate it in-house.

Q4. Japan head office says "automate right away," but what should we explain on the ground?

It's a good idea to show head office this data — that even cutting-edge AI cannot see most real work through to completion. On that basis, propose a phased approach that keeps a human final check. Backing it with numbers makes it easier to head off unrealistic plans.

Q5. Local staff feel anxious about AI adoption. How should I communicate with them?

Before adoption, carefully convey the purpose: "not to cut staff, but to make the work easier." In Philippine workplaces, it can be hard to voice concerns directly to a superior. If you set aside time at a briefing to take questions and get buy-in before starting, the tool is more likely to be used on the ground.

Tips for Getting the Most Out of This (3 Tips)

First, sort your company's work into "short tasks" and "long-workflow work"

What these results show is that AI is weak at long-workflow work. Doing this sorting makes it clear where to use AI and where to keep people. As a first step, draw two columns on paper and write it out.

Build a human final check into the process

A 24% pass rate means AI output contains plenty of errors. Decide, as an assigned role, who checks and at what stage, and build a flow where work can't proceed without that check. This becomes the foundation for preventing mistakes and incidents.

Build the habit of verifying AI marketing numbers against a common benchmark

Marketing puts high numbers front and center, but real ability on real work is another matter. When choosing a tool, check every time "on what test, and by whom, it was measured." It also serves as your basis when explaining things internally.

Bonus: How to Work With PH AI Works

PH AI Works is a company that supports the use of AI and technology in the Philippines. On this article's theme — "realistically bringing AI agents into your operations" — we can help in a way that reflects local conditions.

As a next step, we can advise on things such as:

Separating, among your company's work, which tasks to hand to AI and where people should own the work
Handling information when using AI, in light of the Philippine Data Privacy Act (RA 10173)
Designing the rollout and briefings so local staff can use it with confidence

Please feel free to reach out — initial consultations are free.

How Much Real Work Can AI Agents Actually Do? — What the Agents' Last Exam Reveals, and a BPO Strategy for the Philippines

Part 1: Why This Matters

Step 1: The Philippine Business Context (3 min)

Step 2: Key Points from the Source Article (5 min)

Step 3: Comprehension Check (5 min)

Part 2: Putting It Into Practice

Step 4: Implementation Steps in the Philippines (10 min)

Step 5: Common Mistakes and How to Avoid Them (5 min)

Part 3: Going Deeper

Step 7: Applying This to Your Own Company (10 min)

Separate the parts of your work "AI can handle" from the parts "people must own"

Design AI adoption at BPO sites as a "lift," not a "replacement"

Build an in-house evaluation process so you aren't swayed by AI-performance hype

Part 4: FAQ

Tips for Getting the Most Out of This (3 Tips)

Bonus: How to Work With PH AI Works

References and Sources

Free AI Consultation

Related Articles

Spotting GEO Scams in the AI Search Era: A Guide to Fake Brand-Mention Services for Japanese Companies in the Philippines

Yen at a 40-Year Low: An FX-Risk and AI Guide for Japanese Companies in the Philippines

AI Didn't Kill Engineering Jobs: What the Latest Data Means for IT Talent Strategy at Japanese Firms in the Philippines

Claude Tag in Depth: Putting a Slack-Based Virtual Employee to Work at Your Philippine Operation

GM Installs 50 FANUC Robots: Balancing Automation and Jobs, Seen From the Philippines

What Is Loop Engineering? A Business-Automation Primer for Japanese Companies in the Philippines

How Much Real Work Can AI Agents Actually Do? — What the Agents' Last Exam Reveals, and a BPO Strategy for the Philippines

Part 1: Why This Matters

Step 1: The Philippine Business Context (3 min)

Step 2: Key Points from the Source Article (5 min)

Step 3: Comprehension Check (5 min)

Part 2: Putting It Into Practice

Step 4: Implementation Steps in the Philippines (10 min)

Step 5: Common Mistakes and How to Avoid Them (5 min)

Part 3: Going Deeper

Step 6: Related Technical Terms (5 min)

Step 7: Applying This to Your Own Company (10 min)

Separate the parts of your work "AI can handle" from the parts "people must own"

Design AI adoption at BPO sites as a "lift," not a "replacement"

Build an in-house evaluation process so you aren't swayed by AI-performance hype

Part 4: FAQ

Tips for Getting the Most Out of This (3 Tips)

Bonus: How to Work With PH AI Works

References and Sources

Free AI Consultation

Related Articles

Spotting GEO Scams in the AI Search Era: A Guide to Fake Brand-Mention Services for Japanese Companies in the Philippines

Yen at a 40-Year Low: An FX-Risk and AI Guide for Japanese Companies in the Philippines

AI Didn't Kill Engineering Jobs: What the Latest Data Means for IT Talent Strategy at Japanese Firms in the Philippines

Claude Tag in Depth: Putting a Slack-Based Virtual Employee to Work at Your Philippine Operation

GM Installs 50 FANUC Robots: Balancing Automation and Jobs, Seen From the Philippines

What Is Loop Engineering? A Business-Automation Primer for Japanese Companies in the Philippines