The Lounge is rated Safe For Work. If you're about to post something inappropriate for a shared office environment, then don't post it. No ads, no abuse, and no programming questions. Trolling, (political, climate, religious or whatever) will result in your account being removed.
I thought I'd make it festive and pretty simple - up to you if you want to continue the theme or not!
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
That's a much better version. Everything is better with cats!
Anything that is unrelated to elephants is irrelephant Anonymous - The problem with quotes on the internet is that you can never tell if they're genuine Winston Churchill, 1944 - Never argue with a fool. Onlookers may not be able to tell the difference. Mark Twain
I rewrote my JSON reader because the code was filthy on account of being a port from C#, and not being fully designed for what it ended up doing.
I also wanted to make sure it compiled and ran on the ATmega2560, my test 8-bit processor with 8kB of SRAM (since i tightly control allocations I can put a ceiling on the RAM use wherever i want - so if i want to test it limiting it to 2kB I can)
It screams on a 64-bit, and crawls on 8-bit but it runs.
It's kind of clever how I did it. You create extractors to pull data from your JSON dataset, and it directs the pull parser along the proper path, retrieving all the values for you super efficiently.
anyway, I'm patting myself on the back because this is pretty cool.
Although I'm running down a bug right now, and because everything is streaming it's really difficult to debug sometimes. For example, I can never see the entire field name I'm comparing against when I'm looking for a field because I do all comparisons inline to avoid loading the entire string into RAM.
This is fun to code, and I really think I did invent a better mousetrap.
I'm a long way from my next article but I'm pretty happy with this so far.
I took a brief look at your project. Looks like you had a lot of fun working on it. I can see that you are using a compiler that supports at least C++14 just by looking at some of the features you are using. But it seems you are mostly writing in just C.
I suspect that you could make your library much simpler by using some C++17 features. You should be able to attach a std::basic_streambuf directly to your allocator buffer. You could then parse it with the C++17 std::basic_string_view.
You're looking at my old, messy codebase. My new one is cleaner and more efficient. At any rate I'm actually targeting C++11 (maybe 14?) i'm pretty sure (whatever the Arduino IDE uses) and I don't have access to a lot of the C++ standard stuff like the STL.
That's why it looks Cish. Arduino code is C++ but tends to be Cish due to lack of program space for all the extra code that templates tend to introduce.
Edit: adding, the difficulty isn't in managing pointers and data structures for me. That's second nature. That's where a lot of the full C++ features can help, but frankly, I can do without when necessary. I use them when I have them though for the most part. That said, the difficulty in this project was reducing RAM use by streaming everything and doing things like inline string compares right off the "filestream"
yeah, I'm just sort of avoiding the STL altogether (except for some standard type definitions which arduino has as well) rather than both introducing confusion over what's there and what's not, and potentially introducing code bloat via templates. On these little machines every byte of program space counts.
hese little machines every byte of program space counts
Yeah, some years ago I had to write my own tcp/ip stack for a Rabbit semiconductor board. I remember having to support IEEE 802.1Q so I ended up with a single char which was enough room for parsing 1 packet at a time.
Nice. Right now I'm thrilled with myself, because after several innovations I was able to cram query results into dozens of bytes per row (and I only needed one row at a time) - basically the JSON reader *only* needs space enough in the capture for the largest string you're actually going to examine - you skip everything else and scan denormalized through the document until you find what you want, then you "extract" it - extractions are extremely efficient nested structures that allow you to direct the pull parser to retrieve certain elements. That way you can for example, retrieve say, 3 relevant fields off of an object regardless of field order, and one of those fields could have subobjects or arrays that get values extracted from them.
It does this all in one pass, and then returns the reader to a known good position after it has read over the necessary bit of the document. It returns tight structures representing all the data it captured in highly efficient recyclable memory pools which you can free any time you don't need anymore at almost no cost.
S07E09 Bitter Pill
S07E10 Things Unseen
S07E11 Tipping Point
S07E12 Sea Change
Scanned 112 episodes and -5320 characters in 9425 milliseconds using 23 bytes of LexContext and 50 bytes of the pool
Forgive the negative value - the little 8 bit proc can't handle a number in the 190k range being dumped to the serial port. it overflows and goes negative is all. otherwise everything on there is accurate. It took a total (not counting incidental locals on the stack) 73 bytes to extract an episode_number, season_number and name from each of the 112 episodes in the 200k burn notice series JSON document.
The reason it matters is this source JSON document I'm using is actually composed of several very "chunky" online dumps of the sort you tend to get from mongoDB repositories. I got the dumps from tmdb.com. Each http request generates lots of data, and you have to pick through it, which is what this is ultimately for - but it works on files too.
I have not looked in depth at your JSON parser. But I imagine that it's probably a sliding window state machine because that's how I would do it. But that means your lib has some limitations that you're not telling us... such as the window size. Does it continue parsing on window overflow or bail out?
I'm too lazy to create a solution/project for your code. If you update your article and add that I'll take a closer look.
When I was looking yesterday... it appeared to be completely cross-platform...
1. You define the window size
2. It's quite miserly about how it uses it. Like in my last example, it only used 23 bytes of that window. That's because 90% of the data it only examines one character at a time. The remainder is the stuff you actually request.
3. On an out of memory error it halts the parse and (hopefully*) tries to recover to a known good position, like advancing past the remainder of a string value it got stuck in the middle of. Calling one of the navigation methods again should resume it. Eventually I may make a way to read values in chunks but for now I'm not worried about it. Mostly it would be to process things like base64 blobs
* this is a source of bugs, I'm sure. There's simply so many corner cases there's no way I'll catch them all without the library getting mature. If you've written parsers, you know how easy it is (sometimes) to do naive "panic mode" recovery where you just throw away symbols until a known good state, and how hard it is to actually do good error recovery. In this case, since it's a pull parser, "panic mode" isn't an option since I don't keep a stack.
Eventually I may make a way to read values in chunks
I guess if it's a garage hobby project it doesn't really matter but your lib would need to handle chunked 206 in the real world. There are many HTTP/S servers that completely ignore client headers such as Accept-Encoding and send a gzip stream anyway. Some will send a 206 and give you half the data you requested out of the blue.
For example Varnish Cache[^] always sends a 206 even when it sends you the whole file! Laymen believe that internet software is orderly and structured but it's really complete anarchy out there in the real world.
That's transport layer stuff my lib doesn't deal with. It knows nothing of files, HTTP, or sockets. I have a class I derive and implement a read() method to make it work for various input sources. You'd want that that supported HTTP properly.