
After investigating an issue with Azure Streamiung Analytics, we discovered it cannot deserialise JSON that have the same property names but differ in case e.g.
{
"Proxy": "abc",
"proxy": "def"
}
If you send the above payload to a Streaming Analytics Job, it will fail.
Source ‘<unknown_location>’ had 1 occurrences of kind ‘InputDeserializerError.InvalidData’ between processing times ‘2020-03-30T00:19:27.8689879Z’ and ‘2020-03-30T00:19:27.8689879Z’. Could not deserialize the input event(s) from resource ‘Partition: [8], Offset: [1], SequenceNumber: [1]’ as Json. Some possible reasons: 1) Malformed events 2) Input source configured with incorrect serialization format
We opened a ticket with Microsoft. This was the response.
“Hi Romiko,
Thank you for being patience with us. I had further discussion with our ASA PG and hereβs our findings.
Findings
ASA unfortunately does not support case sensitive column. We understand it is possible for json documents to add to have two columns that differ only in case and that some libraries support it. However there hasn’t been a compelling use case to support it. We will update the documentation as well.
We are sorry for the inconvenience. If you have any questions or concerns, please feel free to reach out to me. I will be happy to assist you.”
Indeed other libraries do support this, such as powershell, c#, python etc.
C#
using Newtonsoft.Json;
…
data = “{‘name’: ‘Ashley’, ‘Name’: ‘Romiko’}”
dynamic message = JsonConvert.DeserializeObject(data);
var text = JsonConvert.SerializeObject(message);
I have built the following tool on github as a workaround for streaming data to elastic. https://github.com/Romiko/EventHub-CheckMalformedEvents
A significant reason why Microsoft should support it – is the Elastic Common Schema. (ECS), a new specification that provides a consistent and customizable way to structure your data in Elasticsearch, facilitating the analysis of data from diverse sources. With ECS, analytics content such as dashboards and machine learning jobs can be applied more broadly, searches can be crafted more narrowly, and field names are easier to remember.
When introducing a new schema, there is always dealing with existing/custom data. Elastic have an ingenious way to solve this. All fields in ECS are lower case. So your existing data can be guarnteed to not conflict if you use an UpperCase.
Let us reference Elastic’s advice
https://www.elastic.co/guide/en/ecs/current/ecs-custom-fields-in-ecs.html
Elastic, who deal with Big Data all the time recommend using Proxy vs proxy to ensure migrations to ECS is a vaiable/conflic free solution.
Conclusion
If you are migrating huge amounts of data to Elastic Common Schema (ECS), Consider if Azure Streaming Analytics is a good fit due to the JSON limits.
You can also vote to fix this issue here and improve Microsoft’s product offering π
https://feedback.azure.com/forums/270577-stream-analytics/suggestions/40122079-azure-stream-analytics-to-be-able-to-handle-case-s