xx network Economic Tweaks – Realtime Failure Deductions

Person with a calculator

Based on community feedback, the team has a proposal for economics, and is looking for the community to help in the final decision

An issue that has come to light with the launch of the xx network MainNet is that there are some lower performing nodes running on the network.

In the previous incarnations of the network  — AlphaNet, BetaNet, and ProtoNet — these issues were handled by the team by simply disabling poor performing nodes. This obviously is not possible in MainNet which is controlled by nPoS.  

The community has been discussing this issue since MainNet launch (you can find a very thoughtful thread in the #MainNet-chat channel on the discord). Solutions involving sliding scales of penalties, among others, have been proposed. Overall, it is the team’s opinion that a tweak to the existing solution, along with a client side modification, is the right approach.

To understand the current solution, some details of the economics will need to be reviewed. In each epoch (24 hour period) an amount of coins are awarded (the mechanism for this decision is described in the xx economics paper) and distributed among all nodes.  The portion of these coins awarded to a given node is equal to their portion of total coins earned. For example, if in an epoch award totals 50,000xx, and a specific node got 10,000 points out of a total 10,000,000, they will get (10,000/10,000,000)×(50,000xx) = 500xx (to be split between the validator and nominators).

But how are these points earned?

Points are earned for two things within the xx network: making blocks and running cMix rounds. It is within the mechanism for running cMix rounds that the incentivization scheme lies.

Whenever a round is completed, all 5 nodes in the team earn 10 points, while when a round fails during the realtime phase the nodes lose 20 points. This loss of points is to disincentivize bad behavior. At first this seems very unfair – if a round fails due to one node, why should every node be penalized?

There are two reasons:

The first is that under BFT (Byzantine Fault Tolerance) it is not possible to determine who is at fault within the cMix protocol. This means it is not possible to prove which node should lose points.

The second is that in the aggregate, other nodes are not penalized. For example, imagine a network with 15 nodes and 5 node teams where all nodes except for one cause 0% of rounds to fail, with one failing 50%.  What will happen is that given enough rounds, all good nodes will work with the bad node equally and because points are distributed based upon the total ratio of points, not total points, all “good” nodes will get the same number of xx coins, while the bad node will be penalized. In the above case, a node will be in a team with the bad node 1/3rd of the time. It has a 50% failure rate, so all nodes will have an aggregate 16.667% failure rate. Assuming 100,000 rounds, and each participates in 1/3rd, that means that with the current economics, they will earn 10×100,000×⅓×(1-%16.667) =  277,778 points, and lose 20×100,000×⅓×(%16.667) = 111,111 points, making a total of 166,667 points. While in the same scenario, the offending node will earn 10×100,000×⅓×(%50) = 166,667 and lose 20×100,000×⅓×(%50) = 333,333 points, making a total of -166,667. Points cannot go negative, so the offending nodes get 0 points, and as a result all rewards are split between the other 14 good nodes evenly – as if the offending node was never there.

This solution does work, with the exception that it must target a failure rate at which nodes should lose all their earnings. Given that we want a system with very high reliability, we want to target a number much higher than 50%.

In general, the equation relating the targeted failure rate and the points is as follows:

Given that points success is always 10, this gives us:

This is also an approximation because it does not take into account regional multipliers, how long it takes for nodes to recover from failed rounds, nor how different nodes hardware and internet configurations may impact round times. But it is sufficient for analysis at this time.

The big question that remains is, what should be the appropriate “targeted failure”.  In general, it can be higher than one expects because points, and therefore earnings, still go down substantially when a node has high failure rates that do not meet it.

Over the last 12 hours, the highest realtime failure rates are as follows:

27.22%, 3.79%, 3.58%, 1.64%, 1.19%

With an average failure rate of 0.5% (median 0.35%). 

Given this data, the team believes we should move from targeting a failure rate of 33% down to 5%, which would bring the point deduction per realtime failure to 190.  

We would like to open this issue up for community discussion over the next few days and will revisit is based upon the response on December 6th 2021

There is also a secondary stop-gap solution the team is working on. The ultimate flaw with economic solutions is that they can take time, which is not great when messages are actively being dropped by the xx messenger and other users of the xxDk. As a result, the team will be publishing a list of currently poorly performing node operators which xxDK users can optionally opt into not sending messages on rounds that contain those nodes. It will also be possible for xxDK users to opt into other, separately curated lists.

The team will also be looking into making the team multiplier contingent on minimum performance, with more information coming soon.