How Function Calling Works (and Where It Breaks) [AI Deep Dives, Chapter 6]

George

Chief of AI at Moveo

September 26, 2025

✨ AI Deep Dives

If you've been following our "AI Deep Dives" series, you know we've already explored the limitations of approaches like RAG and the inadequacy of the "Prompt & Pray" strategy for critical business tasks.

In our previous chapter, "From Prompt & Pray to Tools & Pray", we started to question if using tools was the magic bullet. In this sixth chapter, we'll take a deeper dive into function calling: the ability of large language models (LLMs) to interact with tools and APIs.

We'll see how this functionality works, why it's so popular, and, most importantly, why it becomes a source of problems when applied in corporate environments that demand rigor, security, and governance.

Let's get into it!

What is Function Calling?

Function calling (or tool calling) is a feature that allows an LLM to interact with the outside world. Essentially, you provide the model with a list of tools (functions), along with their names, descriptions, and the expected arguments. During a conversation, the model can decide to:

Select a tool (e.g., getTransactions).
Fill in the arguments (e.g., { "date": "2025-09-01" }).
Your application executes the tool and returns the result.
The model uses the result to continue the conversation or call another tool.

This approach works perfectly for simple, low-risk tasks like checking the weather, looking up a CRM record, or doing a quick calculation. If the model makes a small mistake, the business risk is minimal. This is why many developers and teams love the simplicity and flexibility it offers.

Why Function Calling Breaks in Real-World Enterprises

Despite its usefulness, function calling isn't the definitive solution for all enterprise challenges. In high-stakes, customer-facing, complex environments, it fails in unpredictable and dangerous ways. Failures in practical applications often fall into five categories:

1. Tool Selection at Scale

As the number of available tools increases, the model can struggle to choose the correct action. Imagine a support assistant that can createRefund(), initiateChargeback(), or openDispute(). If a user requests to "dispute a transaction" but the model instead triggers a refund or chargeback, the outcome could be financial disorder, complicated reconciliations, and even potential compliance violations.

‭→ Read Chapter 4: AI for Payments: Why "Prompt & Pray" fails and what scales safely

2. Argument Correctness

Even if the model selects the right tool, it can fill in the arguments incorrectly. For example, an assistant might cite a transaction ID that looks right but doesn't belong to the user, or set a plan's start_date incorrectly. Small errors like these lead to incorrect actions that can cause major headaches, such as failed disputes or incorrect charges.

3. Order and State of the Workflow

In multi-step workflows, such as requiring authentication before a transaction, function calling cannot guarantee that steps will be executed in the proper order. It is not a dependable mechanism to ensure the model consistently follows the correct sequence.

For instance, the model might create a plan or dispute a transaction before authenticating the user or getting explicit consent. The risks of skipping steps, the absence of retries or rollbacks, and the persistence of hallucinations make this approach unsuitable for critical processes.

4. Business Logic in Natural Language

Trying to encode complex rules ("send OTP before creating a plan, unless the user is verified") directly into prompts is an invitation to disaster. These natural language instructions are ambiguous and brittle, and minor wording changes can drastically alter the model's behavior. This makes it nearly impossible to ensure consistency and compliance at scale.

5. Compliance and Audit

In regulated industries, certain disclosures, tone controls, and records of each step are mandatory. The model cannot guarantee that a required phrase, such as "This is a communication from a debt collector," will be included.

If the model makes a tonal error or skips a consent step, the company could face fines, reputational damage, and a lack of essential artifacts for auditors.

Where it breaks down	What the user sees	Why it’s a problem for the enterprise
Tool selection at scale	Model calls `createRefund()` or `initiateChargeback()` instead of `openDispute()` → user asked to dispute, but a refund/chargeback was created.	Incorrect financial action; reconciliation chaos and potential clawbacks; compliance exposure; undoing may be complex or irreversible
Argument correctness	Assistant cites a transaction ID that looks right but isn’t the user’s; or sets plan `start_date` one period off; picks `biweekly` when user said `monthly`.	Disputes fail or hit the wrong item; mis-configured plans; reconciliation and regulatory headaches.
Order & state	Assistant creates a plan / disputes a transaction before authenticating or without explicit consent.	Implementing strict step-by-step processes reliably inside prompts is not feasible due to several risks: models may skip steps, there's no guarantee of sequencing, retries, or rollbacks, and the risk of hallucination persists.
Business logic in prompts = ambiguity	Instruction is vague: “Send OTP before creating a plan, unless the user is verified.” Model interprets “verified” loosely and skips OTP.	Natural-language rules are ambiguous/brittle; behavior varies with tiny wording changes; impossible to prove consistent compliance.
Compliance & audit	Missing required disclosure (e.g., “This is a communication from a debt collector”); or coercive phrasing (“You must pay today”).	Regulatory exposure, fines, reputational damage; poor or missing artifacts for auditors (consent, disclosures, step logs).

What’s missing from Function Calling?

When critical steps, such as consent, authentication, or database writes, live inside the model’s prompt, there’s no assurance they will happen, happen in the right order, or be executed correctly. Function calling, on its own, lacks the control and predictability needed for complex enterprise environments.

This is why businesses can't rely solely on it. They must go beyond it, combining the flexibility of language models with an architecture that provides control, governance, and predictability.

‭→ Read Chapter 2: The great AI debate: Wrappers vs. Multi-Agent Systems in enterprise AI

The journey continues...

Function calling is powerful, as it gives LLMs the ability to interact with the outside world and even execute real actions. But this power doesn’t solve the underlying challenge. Function calling is not a control plane: it doesn’t guarantee accuracy, enforce the correct order of steps, or provide the reliability and governance enterprises require.

Instead, it should be seen as just one component within a broader enterprise framework. On its own, it amplifies what a model can do; combined with the right architecture, it becomes part of a system that ensures rigor, compliance, and trust.

In our next installment, Chapter 7: The Moveo.AI approach (deeper), we'll dive into how a hybrid architecture, one that combines the intelligence of LLMs with the rigor of deterministic dialog flows, can solve the challenges we've explored.

Are you ready to discover how to build AI systems that are both smart and, most importantly, reliable? Stay tuned!

Talk to our AI Experts →

Table of content

What is Function Calling?

Why Function Calling Breaks in Real-World Enterprises

The journey continues...

No code interface

Auto builder

Collections

Analytics

Live chat

Zendesk

Intercom

Front

Sunshine conversations

All integrations

Solutions

Customer service

Debt collections

Marketing & sales

Financial services

GameTech

Improve your CX

Generate more leads

Resources

Blog

Glossary

Docs

Trust Center

Solutions

Legal

Data Processing Agreement

Subscription Agreement

Company

About us

Careers

Partners

Our Plans

Contact

info@moveo.ai

368 9th Ave.
New York, NY
10001, USA

Avenida Paulista, 1374
Bela Vista, São Paulo
SP 01310-100, Brazil

Makedonon 8
Athens, Attiki
11521, Greece

Privacy Policy

Cookies Policy

Terms of use

Contact

info@moveo.ai

368 9th Ave.
New York, NY
10001, USA

Avenida Paulista, 1374
Bela Vista, São Paulo
SP 01310-100, Brazil

Makedonon 8
Athens, Attiki
11521, Greece

Privacy Policy

Terms of use

Cookies Policy

Contact

Makedonon 8
Athens, Attiki
11521, Greece

368 9th Ave.
New York, NY
10001, USA

info@moveo.ai

Avenida Paulista, 1374
Bela Vista, São Paulo
SP 01310-100, Brazil

Privacy Policy

Terms of use

Cookies Policy

Platform

Solutions

Resources

Our Plans

Select Language

Log In

Book a demo

Log In

Select Language

Book a demo

How Function Calling Works (and Where It Breaks) [AI Deep Dives, Chapter 6]

George

What is Function Calling?

What is Function Calling?

Why Function Calling Breaks in Real-World Enterprises

1. Tool Selection at Scale

2. Argument Correctness

3. Order and State of the Workflow

4. Business Logic in Natural Language

5. Compliance and Audit

What’s missing from Function Calling?

The journey continues...

Platform

No code interface

Auto builder

Collections

Analytics

Live chat

Zendesk

Intercom

Front

WhatsApp

Sunshine conversations

All integrations

Solutions

Customer service

Debt collections

Marketing & sales

Financial services

GameTech

Improve your CX

Generate more leads

Resources

Blog

Glossary

Docs

Trust Center

Legal

Data Processing Agreement

Subscription Agreement

Company

About us

Careers

Partners

Our Plans

Privacy Policy

Cookies Policy

Terms of use

Platform

No code interface

Auto builder

Collections

Analytics

Live chat

Zendesk

Intercom

Front

WhatsApp

Sunshine conversations

All integrations

Solutions

Customer service

Debt collections

Marketing & sales

Financial services

GameTech

Improve your CX

Generate more leads

Resources

Blog

Glossary

Docs

Trust Center

Legal

Data Processing Agreement

Subscription Agreement

Company

About us

Careers

Partners