Hacker News new | past | comments | ask | show | jobs | submit login

It’s hard to get application performance monitoring (APM) tools using FOSS because of the complexity of writing libraries.

You could ideally work most things with FOSS. Use opentelemtry extensions to a time dries DB of your choice. Use elk or opensearch for logs, osquery for Linux, native cloud monitoring tools that come out of the box, etc.

Problem is it takes a lot of setup and maintenance. It can bite you back in the worst of times - when you’re trying to work an outage and find out alerting is dead because your fluentbit daemon is hung and no logs came through.




Or you can spending a day or two creating enough things to track the "end to end" numbers and debug problems the hard way when they appear.

Modern APM is completely out of proportion, and most people that use it do absolutely not get enough benefit for paying for those systems. My impression is this is yet another of those Gartner-created hypes that survive as a meme on the high-management minds and don't survive touching the ground.


Worse yet I saw a lot of APM-or-nothing mentality from CTOs & senior management such that we skipped over very simple to implement, basic monitoring that catches the typical issues.. because "APM is coming" and "we hired an SRE" and and and..

People want to chase the bright new thing that Gartner is pushing (its always these guys).


> My impression is this is yet another of those Gartner-created hypes...

Bang on!

Its part of same buzzword bullshittery of Kubernetes/ Cloud native / Micro services / observability and so on. Create an out of proportion tech stack that no one would understand and then one needs these APMs to monitor "p99 latencies" because who the fuck knows now on how incoming http request will get processed.


..k8s is buzzword bullshittery?

Look, there's a bunch of bullshit hype, but I think you're being overly reactionary.

Plenty of devs will never touch a profiler (as complex as a cloud based APM simple as jfr), that doesn't mean they don't have use.

Plenty of buzzwords conflated hype and misunderstanding with useful technology

'nosql'-> yeah don't try to do olap on nosql but document databases or kvs are still super useful. It's basically a hashmap with some index what's not to like (also using 'nosql' as a term to differentiate between relational databases and ...non relational is already a misnomer)

'big data'-> stupid hype at the beginning and yeah you can have a functioning product without it but try leading a business without any data to make a decision

'cloud'-> colocated data centers and timesharing have been useful things for decades but man do I love having an API to formalize it rather than calling up the onsite data tech and having a phonecall to do anything (that or a multi day email chain)

'microservices'-> yeah it's intractible if you atomize your app, but splitting application boundaries and defining contracts makes it a lot easier to horizontally scale, and it's cheaper in the end of you do it right. Also functions as a service are great for glue code and infra IMHO.

The longer I get in my career the more I see that wading through the bullshit is needed to know wtf I'm talking about. It's similar to science communications, where the actual researchers will figure out something cool and useful but situational and contextual, and then the zeitgeist completely distorts the findings beyond any measure of recognition


To make it clear, monitoring p99 latency is very often a good thing.

You just don't need a fully featured APM package to do that. People have been doing that for decades, for example by running a script on their server logs.


> "for example by running a script on their server logs" Exactly! There's lots of stuff you can do before you write a $1M check to one of these APM SaaS.

Timestamp each hop. Do your percentiles on a trailing X day window each morning. Alert on worrying moves in the p99/95/whatever.

Make sure you don't run out of disk/memory/database by like.. alerting when they get to 80/90/whatever %. Amazing footgun to not monitor this simple thing.

Have a process that serves as a reasonable proxy for "busy-ness" / bottlenecks. ie - gateways, Monitor the % CPU over X minute windows. If the CPU stays saturated for N minutes, something is bottlenecking.

Here's another secret - you want to also monitor for the opposite. Have processes that are "workers" of some sort that should always have load during business hours as there is always data flowing through? Monitor that their CPU doesn't fall BELOW some threshold (it means something fell over and they aren't doing work).

None of this has to alert within milliseconds. Simply having it, and it alerting within a few minutes is better than not having it. Some of these things like latency trends can be run nightly, whatever.

That's an interns work for a week or a seniors afternoon.

This is easy on s5s (servers). I'm sure you can do this in k8s or whatever too.


Java has an excellent one open source APM - https://glowroot.org/.

Python and other tools - not so much.

I wouldn't say Gartner created hype. Glowroot has been invaluable to us. Of course we run a massive footprint of 0.5m - 0.75m JVMs.


> Of course we run a massive footprint of 0.5m - 0.75m JVMs

Yep, those APM packages are for places like yours. Not for most of the uncountable other places they recommend it.


> fluentbit daemon is hung and no logs came through

Are you talking to me?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:
OSZAR »