|
No worries.
|
|
|
|
|
At the start of the work day we also require that their temperature be taken. Seems to be working so far. 
|
|
|
|
|
In addition to a 3rd party monitoring solution (Splunk) and "just keeping an eye on it", almost all of our apps have some kind of logging table in a database. We periodically run a query to check for any new error logs, and will investigate if something looks odd.
|
|
|
|
|
acomputerdog wrote: logging table in a database. We periodically run a query to check for any new error logs
Exactly.
|
|
|
|
|
Ditto with all my applications - but everything logged as an error is considered "odd" and investigated (by me). To expedite the situation, all logs of level "error" or "fatal" are also emailed to me, and often I'm able to contact the client to confirm there was an issue, but it's fixed now, before they've even noticed themselves.
|
|
|
|
|
That's how I treat our newer apps, but unfortunately we have some old monoliths that generate "errors" even on successful activities. The effort to fix them is large enough to not be worth it, so I just have to deal with the extra logs until the old stuff is retired. At least I've been able to keep cleaner logs for the new stuff.
|
|
|
|
|
since our applications control large manufacturing machines, when they go down our customers let us know. Quickly.
|
|
|
|
|
Very quickly. Been there, done that, got the t-shirt and the bruises. Hell hath no fury like a production stopped.
GCS d--(d+) s-/++ a C++++ U+++ P- L+@ E-- W++ N+ o+ K- w+++ O? M-- V? PS+ PE- Y+ PGP t+ 5? X R+++ tv-- b+(+++) DI+++ D++ G e++ h--- r+++ y+++* Weapons extension: ma- k++ F+2 X
|
|
|
|
|
Depending on the situation, the service, the budget, the customer needs.
Free comes first, paid second, in-house third unless it's reusable/adaptable/resellable on its own (in these cases it comes first and the other slide down).
GCS d--(d+) s-/++ a C++++ U+++ P- L+@ E-- W++ N+ o+ K- w+++ O? M-- V? PS+ PE- Y+ PGP t+ 5? X R+++ tv-- b+(+++) DI+++ D++ G e++ h--- r+++ y+++* Weapons extension: ma- k++ F+2 X
|
|
|
|
|
I have error log files from each of the two system types where my stuff is running. They enumerate every little thing.
Memory allocation failure (not available from the server's memory pool)
SQL Timeout messages (normally, because some user wants everything from forever)
Actual coding errors.
The first is a combination of tweaking the server resources and accepting what the server jocks have allocated for that (what is considered 'critical') application webserver/php instance.
The second gets sent to our DBA - some are fixable, other require killing a user or two
These two are a battle between resources, performance vs stupid.
The last is the goody! It tells you about coding errors that may not show. At this point, were it not for the first two, the error logs would be almost always empty. It's a help with development, too.
The logs are built in, and the prophet, Yogi Berra revealed: "You can see a lot just by looking."
Ravings en masse^ |
---|
"The difference between genius and stupidity is that genius has its limits." - Albert Einstein | "If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010 |
|
|
|
|
|
As long as my mobile phone doesn't ring, all is working fine.
M.D.V.
If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.
|
|
|
|
|
Then turn it off!
|
|
|
|
|
Shhhh.... don't tell it in public... they can be reading.
M.D.V.
If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.
|
|
|
|
|
We usually just ignore them until the customer reports a problem. Most of the time, problems center around what we call "transient data quality issues". Data we import via various mechanisms sometimes doesn't come in correctly, or entirely. We get notified about these problems by the flailing of hands on the part of our users.
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
By analysis of the user complaints.
|
|
|
|
|
monitoring health? implies that your app can get sick, get cancer, possibly die even.
Humans trying to equate their software to human biology. Silliest thing ever.
your app either sucks or it doesn't. end of question.
I think stability is a better term, not "health".
|
|
|
|
|
Slacker007 wrote: I think stability is a better term, not "health". Agree
M.D.V.
If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.
|
|
|
|
|
Slacker007 wrote: Humans trying to equate their software to human biology. Silliest thing ever.
From Dictionary.com:
Kernel (noun)
- (Botany) a softer, usually edible part of a nut, seed, or fruit stone contained within its hard shell. The seed and hard husk of a cereal, especially wheat.
- The central or most important part of something.
- The most basic level or core of an operating system of a computer, responsible for resource allocation, file management, and security.
- (Linguistics) Denoting a basic unmarked linguistic string.
You were saying?
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
Daniel Pfeffer wrote: You were saying?
I was saying how silly it is for a bunch of geeks and nerds to equate software to human biology.
|
|
|
|
|
My point was that many terms used in computing come from biology, and therefore using biological terms to discuss software is not out of line.
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
I think ultimately, I took offense to the term "health" when trying to describe the "stability" of a system or application.
A human is healthy/not healthy.
A program, system, network, application, web site is stable/not stable.
Oh, and I also think Kernel is stupid name too. I can't do anything about it, but such is life.
|
|
|
|
|
...so my app cannot be healthy, but it can "suck." How ironic
|
|
|
|
|
your app either sucks or it doesn't. end of question.
I dig it, but if we're opening the discussion around proper naming, this is definitely an oversimplification. My app can suck a bit in one part, be awesome in another.
I think stability is a better term, not "health".
Humans tend to anthropomorphise literally anything. While stability sounds to me more like a scalar unit of measure around the number of failures over time, a "service health dashboard" can be seen as a group of various telemetries from your application.
You clearly also don't like the term "bug" as it stems from something organic.
|
|
|
|
|
your app can get sick, get cancer, possibly die even.
Yep, absolutely it can.
My app gets some illness on a weekly basis. I've worked with cancerous code numerous times in my carreer. And Google let a couple of their services die every year.
|
|
|
|