Confusion Matrix in multidimanesional case (MNIST dataset)

I couldnt understand that how the values are placed in multi dimensional confusion matrix, why it has confused between 8 and 2?

The confusion matrix is plotted wrt the class labels. The MNIST Dataset, for example, had 10 classes, now a 2-D Matrix, the confusion matrix, is plotted as the true positives v/s the labels predicted by the model. That’s how it works. The confusion matrix totally depends upon the number of output labels, independent of the number of features or dimensions.

It’s difficult to say that because the model learns features at a very minute scale, at that level it might’ve found the curves and some edges similar which led to the confusion.

I hope this clears your doubt.