Metrics

Metrics: Why You Should Care
Most metrics systems were built for infrastructure. They want you to track CPU load on a node you don't control, using query languages that require a cheat sheet. But you're building an app, not running a data center. You care about actual things:
- How many users signed up last week?
- What was the p95 response time for checkout?
- Are any customers getting hit with weird performance issues?
FlexLogs Metrics are made for you, the app developer.
They work like you think:
- Increments track when things happen: a signup, a login, an error. You just send a +1 each time.
- Observations measure things: how long something took, how big a payload was, how deep a queue got. You send the value directly.
And that's it. Two types. Real math. No agents, no exporters, no Prometheus arcana.
You get automatic rollups over time (1min, 5min, 1hr, etc.) with real stats:
- sum, count, avg, min, max
- median, p90, p95, p99
- std_dev for spread
All of it based on your actual data — not buckets or estimates.
Getting Started with Metrics
Setting up Metrics is as simple as tagging your logs. Here are some examples:
The most basic metric is an increment metric, which increases each time it's logged. The default value is 1. The following example shows how to create a couple simple metrics.
// Create a basic (increment) metric
logger.info("flexlogs{metric: 'page-load'}");
// The same metric using the html-like format
logger.info("<flexlogs metric=page-load />");
// Create a basic (observation) metric
logger.info("flexlogs{metric: 'cpu-usage', type: 'observation', value: 0.75}");
Options
Key | Description | Type / Options | Default |
---|---|---|---|
metric | A name for your metric | String | |
type | The type of metric | increment or observation | increment |
value | Metric value | Number | 1 |
tags | List of tags | Array | [] |
Examples
// add some tags
logger.info("flexlogs{metric: 'order.new', tags: ['order', 'checkout']}");
// observation metric for monitoring queue length
logger.info(`flexlogs{metric: 'queue.length', value: ${ Queue.length() }, type: 'observation'}`);
// explicitly set type and value
logger.info("flexlogs{metric: 'user.signup', type: 'increment', value: 1}")
// build from json object
logger.info("flexlogs" + JSON.stringify({metric: "new-feature.enabled"}));
// same as -> logger.info("flexlogs{metric: 'new-feature.enabled'}")
// add multiple tags
logger.info("flexlogs{metric: 'beta_feature.error', tags: ['error', 'beta']}");
// interested in other examples? let us know!
Working with Metrics
Aggregation Functions
Metrics are only useful if they help you make sense of behavior over time. That’s where aggregation functions come in. Instead of staring at raw values, we roll your data up across intervals — 1 minute, 5 minutes, 1 hour, and more — so you can see trends, catch outliers, and compare what's typical versus what's terrible.
Here are the key aggregation functions FlexLogs supports (and why you should care):
- sum — Adds up all values in the interval. Useful for things like "how many emails were sent?" or "how many signups happened this week?"
- count — Shows how many events were recorded in that interval. Handy for tracking throughput or overall volume.
- avg — Gives the mean value. Great for tracking averages: average response time, average retries per job, average cart value.
- min / max — Show the extremes. Want to know if a queue was ever empty? Or how bad a memory spike got? This tells you.
- median — The value right in the middle. More resistant to outliers than average, so it gives you a better feel for "typical" performance.
- p90, p95, p99 — The high-percentile views of your data. These reveal tail behavior: how bad is performance for your slowest users? How often are people waiting too long?
- std_dev — Standard deviation: how much the values bounce around. Useful for spotting instability or inconsistency (especially with durations).
Aggregation Examples: Request Duration
Imagine you track the duration of HTTP requests, and you see this set of values over a minute:
[10, 25, 40, 2400]
(all in ms).
sum → 2475
ms
Total time spent handling requests. Could be useful for understanding compute cost or overall system load.
count → 4
Number of requests in that window.
avg → 618.75
ms
Sure, that seems high — but that’s because of a single 2400ms outlier. Average alone can mislead you.
min / max → 10
ms / 2400
ms
Fastest vs slowest. That’s a massive range, which should make you ask why that one request was so slow.
median → 32.5
ms
Much more reasonable than the average. Most requests were actually fast — the 2400ms outlier is skewing things.
p95 → ~2400
ms
This tells you 95% of requests were faster than 2400ms — but that remaining 5%? They’re painful. p95 is your canary for bad user experience.
std_dev → High
One slow request can wreck your average, and std_dev shows you the variance directly. If this number is large, things are inconsistent.
Why It Matters
You might not care about each individual request, but you do care about patterns:
- "How bad was traffic during the promo campaign?"
- "Are we getting slower over time?"
- "Was that deploy rough for anyone?"
- "Do users in a specific region always get p95 performance issues?"
FlexLogs gives you the power to ask and answer those questions, with real math and no guesswork. Because when you’re trying to improve your app, you can’t afford to fly blind.
FlexLogs Metrics give you real insight without the infrastructure baggage. You get the data that matters, the way you already think about your app. No ragrets. You get the data that matters, the way you already think about your app.