How to Use Claude Code for Free | An AI Cost-Cutting Method for Philippine Development Bases | Case Studies

How to Run Claude Code for Free With Local Models — A Cost-Cutting Guide for Local Philippine Development Teams

We explain the steps to run Claude Code for free by leveraging Ollama and local models. It is a practical guide for keeping down the AI-tool costs of local Philippine development teams and complying with the data privacy law.

Part 1: Why This Matters

Step 1: The Philippine Business Context (3 min)

In the Philippines, the IT and BPO (business process outsourcing) industries have grown into major industries of the nation. In Manila and Cebu, many Japanese companies' development bases and shared-service centers (back-office centers that consolidate the work of several sites) are located. At such bases, occasions to do coding work together with local engineers are increasing, and the introduction of AI coding-assistance tools is a topic of discussion.

However, distributing a paid AI coding tool to every member of a development team runs up an unexpectedly large monthly burden in peso terms. For example, if you distribute a paid plan of US$20 per person per month (about 1,140 pesos) to a team of 20, it becomes a fixed cost exceeding about 270,000 pesos a year. While Philippine IT talent has lower salary levels than Japan, dollar-denominated SaaS fees are the same price as at Japan headquarters. As a result, the ratio of tool costs to local labor costs inevitably tends to be high.

Picture a scene in an office in Manila's BGC district, where a development team leader speaks to a staff member from the information-systems department who has come from Japan headquarters. "I'd like to let all our dev members use Claude Code, but headquarters won't approve the license cost. Is there any way around it?" Being able to answer this question is the goal of this teaching material.

Step 2: Key Points From the Original Article (5 min)

Point	The facts the original article conveys
Claude Code's pricing structure	Claude Code itself can be installed for free. Costs arise from the API usage of the AI models (Sonnet, Opus, etc.) called behind the scenes
The official paid plans	A Claude Pro or Max plan subscription is required; Pro is US$20 per month
The method of making it free	Use Ollama (a mechanism for running AI models locally) and switch Claude Code's connection target to a model running on your own computer
The switching mechanism	Change the API connection target with the `ANTHROPIC_BASE_URL` environment variable, and specify the model to use with the `--model` flag
Recommended models	Qwen3.6 (two types, 27B and 35B, using about 17–24GB of memory), Gemma 4 (a 26B MoE model and a 31B model). For low-spec machines, there is also Gemma 4 E4B (about 5GB, 4-bit quantization)
Required hardware	An Apple Silicon Mac (32GB unified memory) or a high-VRAM GPU. With 16GB of memory, a small model can still run
Performance limits	It does not reach Opus's performance, but it is at a practical level for everyday coding work

Source name — "There's a free way to use Claude Code — and it's surprisingly simple" (May 4, 2026)

This table was created for learning purposes based on facts from public information. For details, please check the original article at the link above.

Step 3: Comprehension Check (5 min)

Q1. When using Claude Code, does the cost arise from the tool itself, or from the AI model running behind the scenes?

Hint: The original article explains "the harness (the framework of the tool)" and "the model" separately.

Q2. What is the monthly fee of the Claude Pro plan in US dollars?

Hint: The specific amount is given at the beginning of the original article.

Q3. What is the Ollama mechanism for?

Hint: Recall the keywords "local" and "your own computer."

Q4. What is the name of the environment variable for changing Claude Code's connection target?

Hint: It is a name beginning with "ANTHROPIC."

Q5. Name two of the local coding models the original article introduces.

Hint: One is an Alibaba-family model, and the other is a Google DeepMind-family model.

Part 2: Putting It Into Practice

Step 4: Implementation Steps in the Philippines (10 min)

We lay out the flow for introducing Claude Code via local models to a local Philippine engineering team.

Step	Details	Philippines-specific notes
1. Agree on the goal of adoption	Put in writing, on both the headquarters and local sides, what you want to make free	The Philippines has a culture where things proceed by verbal agreement, but to prevent later misunderstandings, leave the agreed content in email or chat
2. Procure recommended-spec hardware	Prepare an Apple Silicon Mac with 32GB unified memory, or a high-VRAM GPU	In the Philippines, import duties on Apple products are high, and local prices are about 15–25% higher than in Japan. You also need to check BIR (Bureau of Internal Revenue) import procedures and the rules for booking assets as a SEC-registered corporation
3. Install Ollama and obtain the model	Install Ollama and obtain Qwen3.6 or Gemma 4 with the `ollama pull` command	Typical office lines in the Philippines have varying speeds, and obtaining a model over 20GB takes time. Do the first download on a stable wired connection inside the office
4. Switch Claude Code's connection target	Point `ANTHROPIC_BASE_URL` at an internal local server, and specify the model to use with `--model`	If you place a shared local server, also be mindful of responding to the NPC (National Privacy Commission) under the data privacy law (DPA, Republic Act No. 10173), and put your internal data-handling rules in writing
5. Pilot operation and evaluation	Have a few engineers try it for about two weeks, and measure the quality and speed in actual work	Because Philippine staff tend to hold back frank opinions, preparing an anonymous feedback form to gather their honest views is effective

Step 5: Common Failures and Countermeasures (5 min)

Failure pattern 1: "Trying to run it on everyone's laptops for now"

Bad example: You try to install Ollama on every local staff member's laptop, terminals that won't run due to insufficient memory pile up, and in the end no one uses it.

Good example: First, prepare one dedicated inference server (hardware with 32GB or more of memory) inside the office. Have each engineer's laptop connect to that server to use it. The author of the original article also states they operate it this way.

Failure pattern 2: "Replacing everything because it's free"

Bad example: In a rush to cut costs, you leave even important code generation for production projects to local models, the quality drops, and you receive complaints from the client.

Good example: Use local models for in-house tool development and for assisting simple routine tasks. For important work involving client deliverables, continue to use Sonnet or Opus on the paid plan — a two-tier operation.

Failure pattern 3: "Forgetting the context-window setting"

Bad example: You run the model with Ollama's default settings, and when you load long code, processing stops partway through, leading to the judgment that it's useless.

Good example: Ollama's default settings have a small context window (the amount of text the model can handle at once). As pointed out in the original article's comments section too, change it to 64K or more before you start operating. The harder the task, the more tokens it requires.

Part 3: Going Deeper

An LLM (Large Language Model) is an AI mechanism that learns from huge volumes of text and can read and write like a human. At a Manila call center, an operation is common in which an LLM drafts reply text to customers, and local staff tweak it to fit Philippine culture.

An API (Application Programming Interface) is a gateway for sending commands from your own software to an external service. Think of it as a dedicated counter-window mechanism. When a Cebu development base exchanges data with a Japan-headquarters system, going through an API lets it safely send and receive only the necessary information.

An agentic coding tool (a self-directed coding-assistance program) is a mechanism in which the AI acts on its own without a person instructing it one step at a time. The AI reads files, makes judgments, and rewrites code by itself. At startups in the BGC district, because engineers are few in number, leaving routine code fixes to such tools lets them concentrate on their core design work.

A local LLM (a large language model run on your own machine) is a way of running AI only within your own computer or an in-house server, without relying on an external service. To comply with the Philippines' data privacy law, when you do not want to send customer information outside, a local LLM that is self-contained within the internal network is chosen.

Quantization (the simplification of numerical values) is a technique that rounds the internal numerical values of an AI model from fine digits to coarse digits, making it lighter. The "4-bit quantization" that appears in the original article refers to a state where this simplification has been applied more strongly. When you want to load a model onto an ordinary laptop common in a Manila office, using a quantized model lets you get practical speed even with limited memory.

Step 7: Thinking About Applying It to Your Company (10 min)

Review the cost structure of your development-support tools

Prompt: How much per month does your development team spend on AI coding tools? Of that, what proportion of the work genuinely requires the performance of a frontier model (cutting-edge AI)? Conversely, how much work — such as simple refactoring or test-code generation — would a local model be enough for?

Next action: Extract the usage logs from the most recent month, and try classifying the work content into "for important projects" and "everyday light work."

Put data-handling rules at your local site in order

Prompt: Does the source code your local Philippine engineers handle contain customer information or internal confidential information? In light of the data privacy law and your contracts with customers, is it acceptable to send that to an external cloud API?

Next action: Together with the legal department, summarize the internal rules for sending source code to AI into a one-page document.

A mechanism for evaluating that starts with small experiments

Prompt: Even when you propose introducing a new tool, doesn't it take time until headquarters approval comes down? Could the approach of "test small first, show it in numbers, and then expand" be realized at your company?

Next action: Set a two-week trial period and create an evaluation sheet that measures the quality of code generation and the time required.

Part 4: FAQ

Q1. Once I switch to a local model, is it OK to completely cancel the US$20/month Claude Pro plan?

We do not recommend full cancellation. The author of the original article also operates by keeping the Claude Pro subscription while leaving only light work to local models. At Philippine development sites too, the accuracy of a frontier model is needed for important work involving client deliverables. Using both and dividing them by use case is more realistic.

Q2. Can I obtain a model over 20GB without problems even on a Philippine office line?

It depends on the office's line quality. In the main business districts of Manila and Cebu, fiber lines are widespread, but power outages and line instability still occur. It is good to do the first model download on the office's stable wired connection, outside business hours. Once a model is obtained, if you keep it on an in-house server, each engineer does not need to download it every time.

Q3. When explaining to Japan headquarters' IT department, what should I emphasize to get approval more easily?

It is effective to organize and convey three points. First, Claude Code itself is free, and costs arise on a per-model-call basis. Second, with local models, internal data does not go outside, making it easier to comply with the data privacy law. Third, because you start with a trial run by a few people, the initial investment is limited.

Q4. How should I educate local engineers so they use it safely?

For education, we recommend distributing a one-page manual summarizing "what you may do" and "what you must not do." Because Philippine engineers are accustomed to technical documents in English, preparing both an English version and a Japanese version makes it easier to confirm mutual understanding. Always set aside time for Q&A and draw out frank questions.

Q5. Headquarters worries that "quality will drop with a local model." How should I answer?

As the original article also acknowledges, the quality of local models does not reach Opus. However, in everyday coding work the difference is smaller than expected, and it is practical enough if you choose the use case. Present to headquarters a policy of limiting its use to work where quality requirements are not so high — creating internal scripts, generating test code, refactoring code, and the like. The two-tier approach of continuing to handle important projects with a frontier model carries persuasive power.

Tips for Making It Work (3 Tips)

Tip 1: Use "frontier models" and "local models" differently by use case

Trying to make everything local increases costs instead, through rework caused by quality drops. Decide at the outset on a clear division: leave client deliverables to a paid frontier model, and assign internal scripts and simple fixes to a local model. Just classifying the work content at the start of each month curbs wasteful spending.

Tip 2: Set up one shared inference server and have all developers use it

Putting a heavy model on each engineer's laptop breaks down operations through memory shortages and per-terminal environment differences. Place one 32GB-memory inference-dedicated machine in your Philippine office, and have everyone connect to it over the internal network. This makes it easier to unify settings and to respond when trouble occurs.

Tip 3: Leave "numbers" from a two-week trial run before proposing to headquarters

Headquarters does not approve proposals based on impressions. Record, before and after adoption, the time spent on code generation, the operating rate of the generated code, and the number of defects. Summarizing the results of a two-week trial by a few local Philippine staff into a single report, and showing the cost-effectiveness in both pesos and yen, speeds up headquarters' decision-making.

Bonus: How to Use PH AI Works

PH AI Works is an AI and technology company with a base in the Philippines. We provide support for introducing AI coding tools and building local-LLM environments for Japanese companies' local development teams. We can also handle compliance with the data privacy law (DPA) and bilingual Japanese–English education for local staff.

The topics you can consult us on for free are as follows.

First, help with selecting AI coding tools suited to your development team and estimating costs. We make the division between frontier models and local models visible with concrete peso-denominated figures.

Second, support building a local-LLM environment using Ollama. We support everything from selecting the inference server to place in your Manila or Cebu office to the connection setup with Claude Code.

Third, creating education programs for local engineers. We help with English and Japanese manuals and with creating rules for safe use.

Please feel free to contact us.

Citations and References

MakeUseOf — "There's a free way to use Claude Code — and it's surprisingly simple" (2026-05-04)

How to Use Claude Code for Free | An AI Cost-Cutting Method for Philippine Development Bases

How to Run Claude Code for Free With Local Models — A Cost-Cutting Guide for Local Philippine Development Teams

Part 1: Why This Matters

Step 1: The Philippine Business Context (3 min)

Step 2: Key Points From the Original Article (5 min)

Step 3: Comprehension Check (5 min)

Part 2: Putting It Into Practice

Step 4: Implementation Steps in the Philippines (10 min)

Step 5: Common Failures and Countermeasures (5 min)

Part 3: Going Deeper

Step 7: Thinking About Applying It to Your Company (10 min)

Review the cost structure of your development-support tools

Put data-handling rules at your local site in order

A mechanism for evaluating that starts with small experiments

Part 4: FAQ

Tips for Making It Work (3 Tips)

Bonus: How to Use PH AI Works

Citations and References

References and Sources

Free AI Consultation

Related Articles

Spotting GEO Scams in the AI Search Era: A Guide to Fake Brand-Mention Services for Japanese Companies in the Philippines

Yen at a 40-Year Low: An FX-Risk and AI Guide for Japanese Companies in the Philippines

AI Didn't Kill Engineering Jobs: What the Latest Data Means for IT Talent Strategy at Japanese Firms in the Philippines

Claude Tag in Depth: Putting a Slack-Based Virtual Employee to Work at Your Philippine Operation

GM Installs 50 FANUC Robots: Balancing Automation and Jobs, Seen From the Philippines

What Is Loop Engineering? A Business-Automation Primer for Japanese Companies in the Philippines

How to Run Claude Code for Free With Local Models — A Cost-Cutting Guide for Local Philippine Development Teams

Part 1: Why This Matters

Step 1: The Philippine Business Context (3 min)

Step 2: Key Points From the Original Article (5 min)

Step 3: Comprehension Check (5 min)

Part 2: Putting It Into Practice

Step 4: Implementation Steps in the Philippines (10 min)

Step 5: Common Failures and Countermeasures (5 min)

Part 3: Going Deeper

Step 6: Related Technical Terms (5 min)

Step 7: Thinking About Applying It to Your Company (10 min)

Review the cost structure of your development-support tools

Put data-handling rules at your local site in order

A mechanism for evaluating that starts with small experiments

Part 4: FAQ

Tips for Making It Work (3 Tips)

Bonus: How to Use PH AI Works

Citations and References

References and Sources

Free AI Consultation

Related Articles

Spotting GEO Scams in the AI Search Era: A Guide to Fake Brand-Mention Services for Japanese Companies in the Philippines

Yen at a 40-Year Low: An FX-Risk and AI Guide for Japanese Companies in the Philippines

AI Didn't Kill Engineering Jobs: What the Latest Data Means for IT Talent Strategy at Japanese Firms in the Philippines

Claude Tag in Depth: Putting a Slack-Based Virtual Employee to Work at Your Philippine Operation

GM Installs 50 FANUC Robots: Balancing Automation and Jobs, Seen From the Philippines

What Is Loop Engineering? A Business-Automation Primer for Japanese Companies in the Philippines