Last week, I had two presentations, during both of which I was to present an example of data synchronization using the open-source framework we developed in our office, Xari/BitFrameworks. you can read more about the framework here https://github.com/egarim/SyncFramework

I practiced the demo several times and felt confident about it. After all, I was the one who made the last commit to the source. When the conference began, it was my time to shine. I eagerly spoke about the merits of the framework, then it was time for the technical demo.

Everything started well, but after a while, the framework stopped synchronizing data. I was baffled; I had practiced my demo multiple times. What could have gone wrong?

At the end of the demo, I did what everyone does when their presentation fails—I blamed the demo gods.

After resting for a few hours, I reset my computer and cleaned the Visual Studio solution. I practiced several more times and was ready for round two.

I repeated my presentation and reached the technical demo. There was nothing to fear, right? The previous failure was just a fluke… or so I thought. The demo failed even earlier this time. I cursed the demo gods and finished my presentation.

It was a hard week. I didn’t get the dopamine rush I usually got from presentations. It was a rainy Saturday morning, perfect weather to contemplate my failure. I decided not to give up and wrote some integration tests to determine what went wrong.

So why did my demo fail? The demo was about database synchronization using delta encoding theory. For a successful synchronization, I needed to process the deltas in the exact order they were sent to the server. I had a GUID type of index to sort the deltas, which seemed correct because every time I ran the integration test, it worked fine.

Still puzzled by why the demo failed, I decided to go deeper and wrote an integration test for the GUID generation process. I even wrote an article about it, which you can read here.

On my GUID, common problems using GUID identifiers

 

Now I was even more puzzled. The tests passed sometimes and failed other times. After a while, I realized there was a random element in how GUIDs are generated that introduced a little-known factor: probabilistic errors.

Probabilistic errors in software refer to a category of errors that occur due to the inherent randomness and uncertainty in some algorithms or systems. These errors are not deterministic, i.e., they may not happen every time the same operation is performed. Instead, they occur with a certain probability.

Here are a few examples of situations that could lead to probabilistic errors:

1. **Concurrency and Race Conditions**: In concurrent systems, the order in which operations are executed can be unpredictable and can lead to different outcomes. If the software is not designed to handle all possible execution orders, it can lead to probabilistic errors, such as race conditions, where the result depends on the timing of certain operations.

2. **Network Failures and Distributed Systems**: In distributed systems or systems that rely on network communications, network failures can lead to probabilistic errors. These can include lost messages, delays, and partial failures. As these events are inherently unpredictable, they can lead to errors that occur with a certain probability.

3. **Randomized Algorithms**: Some algorithms, such as those used in machine learning or optimization, involve an element of randomness. These algorithms can sometimes produce incorrect results due to this randomness, leading to probabilistic errors.

4. **Use of Unreliable Hardware**: Hardware failures can lead to probabilistic errors. For example, memory corruption, disk failures, or unreliable network hardware can cause unpredictable and probabilistic errors in the software that uses them.

5. **Birthday Paradox**: In probability theory, the birthday problem or birthday paradox concerns the probability that, in a set of n randomly chosen people, some pair of them will have the same birthday. Similarly, when generating random identifiers (like GUIDs), there is a non-zero chance that two of them might collide, even if the chance is extremely small.

Probabilistic errors can be difficult to diagnose and fix, as they often cannot be reproduced consistently. They require careful design and robust error handling to mitigate. Techniques for dealing with probabilistic errors can include redundancy, error detection and correction codes, robust software design principles, and extensive testing.

So tell me, have it ever happened to you? how did you detected the error? and how did you fix it?

Until next time, happy coding!!!!