|
The last paragraph of that is pretty good and it's about twenty years old now.
"They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"
|
|
|
|
|
Interesting that you post this since I was going to suggest, to your previous thread about perf, that you dip down into ASM to wring out some speed.
And here you are.
(I've always admire those fluent in any dialect of ASM, but have never actually bothered trying to learn a single instruction. Maybe about time I just spent a weekend diving in)
cheers
Chris Maunder
|
|
|
|
|
I haven't even started to use that yet, because I'm trying the portable strpbrk() function over a memory mapped file first.
My results are fire.
Approx stack size of local JSON stuff is 176 bytes
Read 1290495 nodes and 20383269 characters in 249.894000 ms at 76.032238MB/s
Skipped 1290495 nodes and 20383268 characters in 33.278000 ms at 570.947773MB/s
utf8 scanned 20383269 characters in 75.141000 ms at 252.857960MB/s
raw ascii i/o 20383269 characters in 58.162000 ms at 326.673773MB/s
raw ascii block i/o 19 blocks in 3.130000 ms at 6070.287540MB/s
Bold line is where I search fast through a document
Edit: Fixed a fencepost error in counting the position.
Edit 2: More complete benchmarks:
Query is $.season[7].episode[2].overview
Approx stack size of local JSON stuff is 152 bytes
Found "Labore magna sint occaecat ea officia labore sit voluptate ut fugiat. Nisi qui commodo consectetur officia incididunt anim do culpa eu. Eu ea magna aliqua excepteur et. Qui eiusmod irure adipisicing enim aute nostrud deserunt eiusmod quis culpa id.rn" and scanned 7149420 characters in 12.034000 ms at 498.587336MB/s
Query is $..id:
Approx stack size of local JSON stuff is 152 bytes
Found 40008 fields and scanned 20383269 characters in 65.563000 ms at 289.797599MB/s
Approx stack size of local JSON stuff is 176 bytes
Read 1290495 nodes and 20383269 characters in 256.696000 ms at 74.017515MB/s
Skipped 1290495 nodes and 20383269 characters in 33.527000 ms at 566.707430MB/s
utf8 scanned 20383269 characters in 72.913000 ms at 260.584532MB/s
raw ascii i/o 20383269 characters in 57.787000 ms at 328.793673MB/s
raw ascii block i/o 19 blocks in 3.106000 ms at 6117.192531MB/s
Edit: Found and fixed a bug with some escape characters not getting translated. (regression when I introduced my fast scanning)
Real programmers use butterflies
modified 27-Dec-20 22:19pm.
|
|
|
|
|
Too high level. Anyone remember:
C:\>debug
-D
0B06:0100 75 60 C6 46 00 00 8A 7E-04 F6 C7 04 74 E6 C6 46 u`.F...~....t..F
0B06:0110 00 02 8B 76 02 80 3C 00-74 4B B3 2E 34 00 F5 0A ...v..<.tK..4...
0B06:0120 B3 3A 38 5C FE 74 05 C6-46 00 01 4E 32 DB 86 1C .:8\.t..F..N2...
0B06:0130 E8 39 EB 3B D6 73 1B 56-51 8B CE 8B F2 AC E8 B2 .9.;.s.VQ.......
0B06:0140 E1 74 09 AC 3B F1 72 F5-59 5E EB 0B 3B F1 72 ED .t..;.r.Y^..;.r.
0B06:0150 59 5E 3A 5C FF 74 0E B4-3B CD 21 86 1C 73 95 E8 Y^:\.t..;.!..s..
0B06:0160 9B DA E9 C9 D7 E9 C3 D7-89 7E 02 80 46 01 0C B8 .........~..F...
0B06:0170 3F 2E B9 08 00 F3 AA 86-C4 AA 86 C4 B1 03 F3 AA ?...............
-
Now, those were the (so-called) good old days. :)
If you can keep your head while those about you are losing theirs, perhaps you don't understand the situation.
|
|
|
|
|
Yes. That reminds me of when i learned 6502 bytecode before i realized i had a built in mini-assembler.
Real programmers use butterflies
|
|
|
|
|
The first assembler I used was simply adding symbols. The instruction set was very regular (the CPU architecture from the days long before microcode), so opcode, modifiers and offsets all had their fixed place in the instruction word.
We played around with this: To generate a MUL (multiply) instruction, you could rather use ADD ADD, as the opcode for MUL was twice the opcode of ADD 
|
|
|
|
|
If you consider one weekend enough...
Wrong is evil and must be defeated. - Jeff Ello
Never stop dreaming - Freddie Kruger
|
|
|
|
|
Chris Maunder wrote: but have never actually bothered trying to learn a single instruction. Allow me to get you started:
MOV Chris, Good_Book;
JMP ASM_PRO;
Ravings en masse^ |
---|
"The difference between genius and stupidity is that genius has its limits." - Albert Einstein | "If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010 |
|
|
|
|
|
I have a short attention span.
Does it contain pictures and large fonts?
cheers
Chris Maunder
|
|
|
|
|
Does what?
Ravings en masse^ |
---|
"The difference between genius and stupidity is that genius has its limits." - Albert Einstein | "If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010 |
|
|
|
|
|
|
That is pretty interesting stuff! I used to be a fractal fanatic and spent a lot of time optimizing algorithms and investigating alternatives. Then I came across GPUs and CUDA and the search was over.
"They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"
|
|
|
|
|
You need to check out FRACTINT[^] - more fractal than even a fanatical fanatic can handle. It just seems to have more and more features. The Wikipedia link hardly touches the surface.
Ravings en masse^ |
---|
"The difference between genius and stupidity is that genius has its limits." - Albert Einstein | "If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010 |
|
|
|
|
|
Yes, I have it and it is quite good. I got lots of ideas from it.
For the highest performing fractal program I have ever seen - check out the Mandelbrot sample that comes with the CUDA SDK. It calculates in real time. You can pan and zoom and updates are instantaneous. It is really fast.
"They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"
|
|
|
|
|
SSE 4.2 is overrated though. The instructions are neat, but they don't execute very quickly. There are also no AVX2 equivalents, just a VEX-encoded version of the 128bit operations. Overall, SSE 4.2 usually doesn't work out that well, though it has niche uses, and it turns out that the PCMPEQB & PMOVMSKB combo wins. It's a bit more boring perhaps, but it turns out that just because something is made for a particular purpose, that doesn't make it the best for that purpose.
Glibc also uses generic instructions instead of the SSE 4.2 special stuff.
|
|
|
|
|
In the end I didn't have to worry about it. I found out how optimized strpbrk() is and I'm using it over a memory mapped file. I'm searching through JSON picking out fields about 560MB/s now
That's satisfying enough, and more portable (memory mapped stuff isn't 100% but i have code for windows and i think either posix or linux so it works with both and falls back)
Real programmers use butterflies
|
|
|
|
|
Message Removed
modified 27-Dec-20 15:11pm.
|
|
|
|
|
Edit: This might be useless. I just found out simdjson's new API supports demand streaming like my library does. The only advantage mine has is it will run on tiny devices.
---------- SNIP
and it's called eliminating branches.
I'm creating a utf8 decoding routine based on some wicked code I found here
branchless-utf8/utf8.h at master · skeeto/branchless-utf8 · GitHub[^]
Black magic it is. I wouldn't be surprised if the compiler vectorizes this and turns it into simd instructions
So now I just need to do some 4 byte windowing without branches. I think i just need between 1 and 4 bit shifts to do it. I need to think on it though. I love this stuff.
I've already sped it up:
utf8 scanned 20383269 characters in 120.909000 ms at 157.142975MB/s
It could be a lot faster though: (currently my code branches)
raw ascii i/o 20383269 characters in 57.367000 ms at 331.200865MB/s
This is the holy grail. I'd be happy with 1GB/s:
raw ascii block i/o 19 blocks in 3.070000 ms at 6188.925081MB/s
I'm using memory mapped files now for speed, which helps significantly but branching is killing me in my decoding routine.
Optimizing still matters.
Watch me hit 1GB/s soon... hopefully.
The reason I'm doing this by the way, is I think I have a better high level algorithm for processing JSON. I've built a better mousetrap, I'm pretty sure.
The problem is to bench it I need to compete with simdjson at a lower level (i/o and decoding) since it's the gold standard in performance for JSON processing so far.
Real programmers use butterflies
modified 27-Dec-20 12:27pm.
|
|
|
|
|
Pretty cool stuff you're doing here , was this all spawned by your pump monitoring project ?
"I didn't mention the bats - he'd see them soon enough" - Hunter S Thompson - RIP
|
|
|
|
|
Thank you.
It wasn't exactly.
I wanted a JSON processor that was better about memory use than ArduinoJSON, and more portable. I've accomplished that in spades, but now I want to make it faster.
Real programmers use butterflies
|
|
|
|
|
You do realise you are mad
"I didn't mention the bats - he'd see them soon enough" - Hunter S Thompson - RIP
|
|
|
|
|
Quite.
Real programmers use butterflies
|
|
|
|
|
I'll be trying to build a screen capture utility for users to make error reports. Not very complex: they'll click an icon and it will grab the entire screen (with error message(s) ) and email (SMTP) it to IT Support. To be built with VS2015. The user's only interaction will be that of clicking the icon and it will not do any sort of screen display (no sizing or any of that required).
I googled a bit and there are quite a few free of libraries out there - has anyone here used any of them, and if so, can they recommend some library (or warn me away) ?
This is a bit of a COVID fallback position: the normal portal I built them allows for a user to submit a problem for another user (such as in an adjacent cubicle) - not nearly so useful when people are working from home.
[EDIT]
It worked by adapting How to capture the current screen using Windows Forms | DotNetCurry[^] to VS2015 - The screen is grabbed before the window opens and the window is to serve as an acknoledgement to the user that something happened. This code is from a much older version of VS, before gcnew, '^', and a few other things that are no longer valid
[/EDIT]
Ravings en masse^ |
---|
"The difference between genius and stupidity is that genius has its limits." - Albert Einstein | "If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010 |
modified 4-Jan-21 14:30pm.
|
|
|
|
|
Windows comes with a snipping tool. Just have them use that.
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
I can't even get our field service people to use that. I consistently get error reports with 'screen captures' that were taken with a phone camera from three feet away.
Software Zen: delete this;
|
|
|
|