API Security Best Practices: What Works in Production

Most API security advice can be described by the following statement:

"Theory and practice are the same in theory, but not in practice."

There's the theoretical, outside-in look, based on CVE databases, security blogs, and compliance checklists. Then there's the actual security threat, gathered from 3 billion API request runtime data across thousands of APIs, which shows the real risks happening every day. Can't argue with (traffic) data.

API security is the practice of protecting APIs from unauthorized access, abuse, and data exposure across their full lifecycle, from design through production monitoring. It covers authentication, authorization, input validation, rate limiting, and continuous traffic analysis.

Why API security fails in production

"No plan survives first contact with the enemy."

The gap (in this case, a chasm) between secure-by-design and secure-in-production is wider for APIs than for almost any other surface. Treblle's 2025 data makes the scale of that gap concrete: 47% of APIs process requests without any authentication, 42% of API traffic still travels over unencrypted HTTP, and only 15% of APIs implement rate limiting.

The Global API Scorecard for 2025 came in at 58/100, a failing grade by any academic measure, despite world-class performance improvements in the same period. There are a few reasons this gap persists.

APIs tend to proliferate faster than documentation or governance can keep up. A team ships a new endpoint, it gets used, the team changes, and six months later, nobody is certain what authentication mechanism protects it or what data it returns. It's chaos.

Treblle's traffic analysis consistently surfaces APIs with no authentication at all. In most cases, the auth layer was never added: the endpoint was created quickly and the gap was never caught.

The other structural problem is that APIs are tested at design time but abused at runtime. A penetration test run on a staging environment doesn't tell you how your API behaves at 3am when a credential-stuffing bot runs 40,000 login attempts from rotating IP addresses. It doesn't surface the endpoint that returns slightly too much user data in its response body because a developer added a field for debugging and nobody removed it.

Security tooling designed for web applications often misses API-specific risks. A WAF looking for malicious HTML injection won't catch a broken object-level authorization vulnerability where a user modifies a resource ID in a JSON payload to access another user's data. The attack surface is different, so the protection has to be, too.

The OWASP API Security Top 10 in plain terms

The OWASP API Security Top 10 is the clearest published framework for categorizing API-specific vulnerabilities. Unlike the original OWASP Top 10, which covers web application risks, this list was built specifically around how APIs get compromised. The 2023 edition is the current reference.

The categories worth understanding in depth:

Broken Object Level Authorization (BOLA)

is the most common API vulnerability in production. An API that lets a user fetch /orders/1234 without verifying that the order belongs to the requesting user is vulnerable. The fix sounds simple (validate that the authenticated user owns the resource), but it requires consistent implementation across every endpoint, and in large APIs with many contributors, that consistency breaks down. It's simple to do, but also simple not to do.

Broken Authentication

covers a range of failures: tokens that don't expire, API keys embedded in client-side code, JWT signatures that aren't validated server-side, and authentication flows that can be bypassed. Treblle's scanning checks for authentication presence on every request, but the quality of that authentication (whether the token is actually validated, whether it's scoped correctly) requires design-time review.

Broken Object Property Level Authorization (BOPLA)

is the modern term for what was previously called excessive data exposure. An API endpoint that returns a full user object, including password hashes, internal flags, or billing information, when a client only needs a name and email, is exposing data it shouldn't. The fix is explicit field selection in responses, not returning everything and hoping the client ignores the sensitive parts. "Hope" isn't a security strategy.

Unrestricted Resource Consumption

covers missing or inadequate rate limiting. An endpoint with no rate limit is an open invitation to scraping, brute-force attacks, and denial-of-service attacks. This is one of the most consistently absent protections in production APIs. See the rate limiting section below for implementation specifics.

Broken Function Level Authorization (BFLA)

is distinct from BOLA. Where BOLA is about accessing the wrong data object, BFLA is about calling functions you shouldn't have access to. An admin endpoint that's only hidden from the UI but not protected at the API layer is a BFLA vulnerability, like the FIA Max Verstappen case.

The remaining five categories (Unrestricted Access to Sensitive Business Flows, Server-Side Request Forgery, Security Misconfiguration, Improper Inventory Management, and Unsafe Consumption of APIs) each address specific attack patterns worth understanding.

Authentication and authorization: where most teams get it wrong

Authentication (verifying who is making a request) and authorization (verifying what they're allowed to do) are distinct problems that often get conflated in implementation.

The authentication layer establishes identity. The fact that 47% of APIs process requests with no authentication at all, and only 6% of all API errors are 401 Unauthorized responses, which suggests the problem isn't just missing auth, it's missing enforcement: systems aren't rejecting unauthenticated users because they aren't checking for them. Even the DMV asks for your ID card to verify your identity.

For most APIs, this means validating a JWT, OAuth token, or API key on every incoming request. The failure modes here are more subtle than "no authentication": for example, accepting tokens with an expired signature, failing to validate the token's issuer or audience fields, or issuing API keys with no expiry and no rotation mechanism.

JWT validation deserves specific attention

A JWT has three parts: a header, a payload, and a signature. The signature is what prevents tampering, but verifying it requires checking the algorithm, the secret or public key, and the expiry claim. Libraries that accept alg: none in the token header, which disables signature verification, are a known attack vector. The fix is to always explicitly specify the accepted algorithm in your validation code, rather than trusting what the token itself claims.

API key management

is often treated as simpler than OAuth, but carries its own failure modes. Keys embedded in client-side JavaScript are exposed to anyone who opens developer tools. An API key with no expiry persists long after the user or service that needed it has been decommissioned. Scope restrictions are also frequently absent, allowing a single key to access any endpoint the API exposes. A basic API key policy should cover: server-side storage only, defined expiry, scope restrictions where possible, and a rotation mechanism.

Authorization

meaning what an authenticated user can do, is where most breaches in production APIs actually happen. The BOLA and BFLA categories above are both authorization failures. The underlying problem is that authorization logic is often implemented inconsistently across endpoints: one developer adds an ownership check, another forgets it, and the inconsistency isn't caught until a security review or, worse, an incident.

The most defensible approach is centralized authorization: a single layer or service that handles ownership and permission checks consistently, rather than leaving each endpoint to implement its own logic. Whether you use an OPA policy, a middleware layer, or a permissions service depends on your stack, but the goal is the same: authorization decisions happen in one place, not scattered across hundreds of route handlers.

OAuth scopes

are worth calling out specifically. A token with an overly broad scope (read:all, write:all, or admin) can be used to access far more than the issuing application intended. Scoped tokens should grant the minimum access a client needs for its function. A mobile app that needs to read a user's profile doesn't need write access to their billing information. When a token is compromised, a narrow scope limits the blast radius significantly compared to a broadly-scoped one.

Rate limiting and abuse detection

Rate limiting is one of the most straightforward protections to implement and one of the most frequently absent in production. Only 15% of APIs implement rate limiting, leaving 85% of production APIs unprotected against volumetric abuse, credential stuffing, or resource exhaustion. This applies to authenticated endpoints as much as public ones.

A basic rate limiting strategy has three components: a limit (how many requests), a window (over what time period), and an identifier (per IP, per API key, per user, or some combination).

The choice of identifier matters. IP-based rate limiting is easy to circumvent with rotating IPs; API key or user-based limits are harder to route around but require the request to be authenticated first.

For public-facing APIs, layered rate limiting makes sense: a generous IP-based limit to stop raw volumetric attacks, a tighter per-key limit to control individual consumer usage, and endpoint-specific limits for expensive operations like search or data export.

The HTTP 429 Too Many Requests response should include a Retry-After header indicating when the limit resets. This is a small implementation detail that dramatically reduces support load, since clients that know when to retry don't need to contact you when they're rate-limited.

Abuse detection goes beyond rate limiting. Behavioral patterns that warrant investigation include:

a single IP or key accessing a sequential range of resource IDs (a BOLA scanning pattern),
requests that always fail authentication but keep trying with slightly different credentials (credential stuffing), and
traffic spikes on endpoints that are expensive to serve.

Treblle's consumer fingerprinting tracks these patterns at the request level, associating device, user agent, and behavioral signatures to surface anomalies that pure rate limiting wouldn't catch.

Shadow APIs and undocumented endpoints

Shadow APIs are endpoints in your infrastructure that aren't in your official API inventory. They appear when teams move fast: a developer adds a debugging endpoint that never gets removed, a service gets deployed to production without going through the standard API registry, or an old API version remains live after the deprecation date passes.

The risk is that shadow APIs don't receive the same level of attention as documented APIs. They may lack authentication, are rarely included in security scans, and often return more data than they should because they were built for internal use and never hardened for external exposure.

Discovering shadow APIs requires observing actual traffic, not just reading your specs.

Comparing what your OpenAPI specification describes against what your API actually receives in production reveals gaps:

endpoints being called that have no spec entry,
fields being returned that aren't in the response schema,
behaviors that have drifted from what the documentation says.

Treblle's spec drift detection continuously compares against live traffic, surfacing endpoints that exist in the real world but are not in your documented inventory.

The fix varies by what you find:

An undocumented endpoint that's still actively used needs to be documented, reviewed, and secured.
One that's receiving no traffic should be removed, not deprecated and left running, but actually decommissioned.
One that's receiving traffic from a single internal service should be access-controlled so only that service can reach it.

Zombie APIs, meaning endpoints that are technically live but have seen no legitimate traffic consistent with active use, still account for 17% of tracked endpoints in 2025, down from 36% in 2024.

Those that were once public but are past their intended deprecation date deserve particular attention. They frequently lack the security hardening applied to active endpoints, and they're often accessible using old, unrotated API keys. Traffic analysis is the only reliable way to find them, because they won't appear in your current documentation, and they won't show up in your gateway's registered service list if they were deployed before the gateway was in place.

API security monitoring vs. periodic scanning

A penetration test provides insight into your API's security posture at a specific point in time. It's valuable, but it doesn't tell you what's happening at 2am six months later when a new endpoint has been deployed and a new attack pattern has emerged.

Continuous monitoring is what bridges that gap. The distinction matters because APIs change constantly: new endpoints are deployed, response schemas evolve, and authentication logic is modified. A vulnerability that didn't exist last quarter can be introduced in a routine deployment.

What continuous monitoring covers that periodic scanning can't:

Requests that succeed when they shouldn't (authorization failures in production, not in a test environment)
Sensitive data appearing in response bodies that wasn't there before
New endpoints receiving traffic before they've been through a security review
Traffic patterns that indicate scanning or probing behavior from specific consumers

The operational question is what to do with this signal. Security monitoring that produces alerts nobody acts on is no better than no monitoring at all. The practical approach is to define a small number of high-confidence signals (authentication failures exceeding a threshold, a new endpoint with no auth receiving external traffic, a response containing what looks like a PII field not in the schema) and route those to a channel where someone will actually see them.

Incident response for API security events also deserves a documented procedure before you need it. When a breach happens, the questions that take the longest to answer are rarely technical. They're about scope: which endpoints were accessed, which consumer keys were involved, and which users' data was exposed. APIs with comprehensive request logging can answer those questions quickly; APIs without it face days or weeks of forensic reconstruction from fragmented server logs such as the Lloyds Bank case.

Treblle scans every request against 20+ security threat categories in real time, assigning threat level scores (low, medium, high) to each request. In the 2025 dataset, 1% of requests were classified as high-threat, which sounds small until you account for scale: at 1 billion requests analyzed, that's approximately 9 million highly malicious requests including SQL injection attempts, XSS payloads, and remote code execution probes.

The value of scanning 100% of traffic rather than sampling is that low-frequency attacks, including slow credential enumeration that stays below typical rate-limit thresholds, become visible in aggregate even when individual requests look unremarkable.

What a security posture score actually measures

A security posture score is an attempt to summarize an API's overall security health in a single number. Done well, it's a useful tool for prioritizing remediation work and for giving engineering leadership a clear picture of where the portfolio stands. Done badly, it's a compliance checkbox that produces false confidence.

Treblle's security scoring rates every API from 0-100 (A-F) across multiple dimensions:

authentication coverage,
authorization implementation,
data exposure in responses,
rate limiting presence, and
behavior against known attack patterns observed in live traffic.

The score incorporates both design-time analysis (does the OpenAPI spec define auth requirements?) and runtime analysis (do requests in production actually have authentication?).

The gap between design-time and runtime scores is often where the most actionable findings live. An API that scores well on its spec but poorly on its production traffic has implementation drift: the intended security controls aren't being applied consistently. An API with no spec but that appears well-behaved in traffic is underdocumented, which is a governance risk even if it's not an immediate security risk.

What posture scores shouldn't do is replace specific vulnerability findings. A high posture score doesn't mean there are no BOLA vulnerabilities. It means the signals Treblle can observe from the outside look healthy. Object-level authorization logic is something posture scoring can flag as absent or inconsistent, but the specific authorization decisions are in your application code, which requires code review or dynamic testing to fully evaluate.

Posture scores are most useful when tracked over time and across a portfolio. Treblle's 2025 data shows the Global API Scorecard sitting at 58/100, up from 50/100 in 2023, slow progress that reflects how difficult it is to move security metrics at scale. A portfolio view that shows a particular team's APIs consistently scoring below average tells you where governance attention is needed before an incident forces the question. The aggregate view is what turns security telemetry into engineering management information.

API security best practices: quick-reference checklist

Use this against your current API implementation. Items marked as runtime checks require observability tooling; items marked as design-time can be addressed during code or spec review.

Authentication

Every endpoint requires authentication unless explicitly designed to be public [design-time]
JWTs are validated for signature, algorithm, issuer, audience, and expiry [design-time]
API keys have defined expiry and scope restrictions [design-time]
API keys are never exposed in client-side code [design-time]
Token validation failures return 401, not 403 or 200 [design-time]

Authorization

Object-level authorization checks run on every request that accesses a resource [design-time]
Function-level access controls match the user's actual role, not just their authenticated state [design-time]
Admin and internal endpoints are not just UI-hidden but API-protected [design-time]

Input and output

Response payloads return only the fields the consumer needs, explicitly defined [design-time]
Input validation rejects malformed data at the API layer before it reaches business logic [design-time]
Error responses don't expose stack traces, internal IDs, or system information [design-time]

Rate limiting

All endpoints have rate limits, including authenticated ones [design-time]
Rate limit responses include Retry-After headers [design-time]
Expensive endpoints (search, export, bulk operations) have tighter limits [design-time]

Monitoring

All API traffic is logged with enough context to reconstruct a sequence of requests from a single consumer [runtime]
Authentication failures are tracked and alertable above defined thresholds [runtime]
New endpoints appearing in traffic trigger review before they receive significant volume [runtime]
Response bodies are scanned for sensitive data fields not in the schema [runtime]
Traffic patterns consistent with credential stuffing or resource enumeration trigger alerts [runtime]