Another simple toy example shows us the importance of scaling. We use the same example as above except that the i-th column is multiplied by i/10 which means the i-th model variable has been divided by i/10.
d(m) F(m,n) iter Norm
--- ------------------------------------------------ ---- -----------
41. -6. -18. -7. -5. -36. 37. -19. -15. 21. -55. 1 11.59544849
33. 1. -17. 22. 35. -20. -2. -20. 23. -59. 50. 2 6.97337770
-58. 8. -10. 24. 18. -26. -31. 6. 69. 69. 50. 3 5.64414406
0. 10. 0. 0. 0. 0. 0. 0. 0. 0. 0. 4 4.32118177
0. 0. 20. 0. 0. 0. 0. 0. 0. 0. 0. 5 2.64755201
0. 0. 0. 30. 0. 0. 0. 0. 0. 0. 0. 6 2.01631355
0. 0. 0. 0. 40. 0. 0. 0. 0. 0. 0. 7 1.23219979
0. 0. 0. 0. 0. 50. 0. 0. 0. 0. 0. 8 0.36649203
0. 0. 0. 0. 0. 0. 60. 0. 0. 0. 0. 9 0.28528941
0. 0. 0. 0. 0. 0. 0. 70. 0. 0. 0. 10 0.06712411
0. 0. 0. 0. 0. 0. 0. 0. 80. 0. 0. 11 0.00374284
0. 0. 0. 0. 0. 0. 0. 0. 0. 90. 0. 12 -0.00000040
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 100. 13 0.00000000
We observe that solving the same problem for the scaled variables
has required a severe increase
in the number of iterations required to get the solution.
We lost the benefit of the second CG miracle.
Even the rapid convergence predicted for the 10-th iteration
is delayed until the 12-th.