Sometimes it's just better to load "all" the data
8 points by abnercoimbre
8 points by abnercoimbre
One of the first tasks I had in my current role was to fix an automation for pulling metadata from specific Azure VMs. The current process was a PowerShell script that ingested a spreadsheet containing one VM per line and then called "Get-AzVm" on each VM.
The spreadsheet would be hundreds or thousands of lines long, and with each Get-AzVm call taking about 3 seconds, the script would often take more than an hour to finish.
I inverted the process: Run Get-AzVm with no arguments to get every VM from the Subscription, build a hashmap of Name -> VM Object, then ingest the spreadsheet and use the names from the spreadsheet to get the relevant object out of the hashmap. It went from a few thousand calls to just one, so the script now takes 5~ seconds to run.
That is kind of wild that Get-AzVm takes several seconds to complete the API calls, though. While with thousands of calls you would expect that latency to add up, I would expect that latency to be an order of magnitude shorter.
The Azure CLI is just extremely slow in general. No matter what you do with it every invocation takes at least several seconds (and sometimes dozens) to complete.
Edit: Even just running az --help takes over half a minute to run on my work laptop (M4 Pro):
$ time az --help
<output snipped>
az --help 4.36s user 3.07s system 21% cpu 34.634 total
Like awscli, it looks like it's implemented in Python so that could explain certain startup time woes.
A similar example: I had a not-very-large JSON body to parse. The streaming parser was quite a bit slower than the "read from string" parser in the same library (serde).
I had a similar issue once. Turns out the reader I passed into the deserializer wasn't buffered. Wrapping it in BufReader made it on-par with the deserialization from a string and slice.