Let {|an>} be an orthonormal basis of eigenvectors of the operator A, A|an> = an|an>.  Assume that the eigenvalues are not degenerate.  Let |Ψ1> and |Ψ2> be two normalized eigenvectors of the operator B with eigenvalues b1 and b2 respectively.
B|Ψ1> = b11>,  B|Ψ2> = b22>.
(If B is the Hamiltonian H, then b1 = E1 and b2 = E2.)
Assume A and B do not commute.

Assume at t = 0 the system is in the state |Ψ1>.  (If B is the Hamiltonian, then the system is in a stationary state.)
The probability that a measurement of A will yield the eigenvalue an is P1(an) = |<an1>|2.   (4th postulate of quantum mechanics)
Similarly, if at t = 0 the system is in state |Ψ2> then P2(an) = |<an2>|2
Now consider a normalized state |Ψ> which is a linear superposition of |Ψ1> and |Ψ2>.
The system is not in an eigenstate of B.
(If B is the Hamiltonian, then the system is not in a stationary state, it is in a coherent state.)
|Ψ> = λ11> + λ12>,  <Ψ|Ψ> = 1  -->  |λ1|2 + |λ2|2 = 1.
The probability that a measurement of B will yield b1 is  |<Ψ1|Ψ>|2 = |λ1|2.
The probability that a measurement of B will yield b2 is  |<Ψ2|Ψ>|2 = |λ2|2.
The probability that a measurement of A will yield an is
P(an) =|<an|Ψ>|2 =  <an|Ψ><Ψ|an>
= (λ1<an1> + λ2<an2>)( λ1*<Ψ1|an> + λ2*<Ψ2|an>)
= |λ1|2|<an1>|2 +| λ2|2|<an2>|2 + λ12<an2><Ψ1|an> + λ1λ2*<an1><Ψ2|a2>
= |λ1|2P1(an) + |λ2|2P2(an) + 2Re{ λ1λ2*<an1><Ψ2|a2>}
≠ |λ1|2P1(an) + |λ2|2P2(an).

If we had a statistical mixture of states |Ψ1> and |Ψ2>  with weights |λ1|2 and |λ2|2 respectively, i.e. if we had a collection of systems with some in state |Ψ1> and some in state |Ψ2>, then the probability of measuring bi (i = 1, 2) would be |λi|2 and the probability of measuring an would be λ1|2P1(an) + |λ2|2P2(an).
A linear superposition is not a statistical mixture.
If a system is in a linear superposition of eigenstates of an observable B and we measure an observable A which does not commute with B, then we must take interference effects into account when predicting the result of the measurement.

Quantum beats
Let B = H, b1 = E1, b2 = E2.
Then  U(t,0) = exp(-iHt/ħ),  |Ψ1(t)> = exp(-iE1t/ħ)||Ψ1(0)>,   |Ψ2(t)> = exp(-iE2t/ħ)||Ψ2(0)>.
|Ψ(0)> = λ11(0)> + λ12(0)>.
|Ψ(t)> = U(t,0)|Ψ(0)> = λ1exp(-iE1t/ħ)|Ψ1(0)> + λ2exp(-iE2t/ħ)|Ψ2(0)>.
Now the cross term in P(an) becomes  2Re{ λ1λ2*exp(-i(E1 - E2)t/ħ)<an1(0)><Ψ2(0)|an>},
i.e. it becomes time dependent and oscillates with a frequency f12 = (E1 - E2)/ħ.
P(an) oscillates with time, we observe quantum beats.

We may consider P(an) = |<an|Ψ>|2 as the square of the probability amplitude.  The probability amplitude  <an|Ψ> = λ 1<an1> + λ2<an2> is the weighted sum of the probability amplitudes P1(an) and P2(an) .  To obtain the probability P(an) for a linear superposition of states, we take the square of the weighted sum of the probability amplitudes, not the sum of the squares.

Summary:
If a system is in a coherent state (i.e. not an eigenstate of the Hamiltonian,) and we measure an observable that does not commute with the Hamiltonian, probability of obtaining a particular results oscillates in time.  These quantum beats are an interference effect.
Summation over intermediate states
For simplicity, assume that the variables A, B, C are not degenerate and do not commute.  Let {|ua>} be an orthonormal eigenbasis of A, {|wb>} of B, and {|vc>} of C.  Assume a measurement of A is made and after the measurement the system is in the state |ua>.  Now perform two experiments.
(1)  Without giving the system time to evolve, measure C.  The probability of obtaining the eigenvalue c is
Pa(c) = |<vc|ua>|2.
(2)  Without giving the system time to evolve, measure B and then measure C.  The probability of obtaining the eigenvalue b and then the eigenvalue c is
Pa(b,c) = Pa(b)Pb(c) = |<wb|ua>|2|<vc|wb>|2.
Assume that in both experiments the system is in state |vc> after all measurements have been completed.  But in the second experiment we know the path that the system has taken from |ua> to |vc>, i.e. we know the intermediate state |wb>.  In the first experiment the intermediate state is undetermined.
Can the probability Pa(c) for the first experiment be obtained by summing over all intermediate states?
We may write
Pa(c) = |∑b<vc|wb><|wbua>|2 = ∑b<vc|wb><|wbua>∑b'<vc|wb'>*<|wb'ua>*

= ∑b|<vc|wb><|wbua>|2 + ∑bb'≠b<vc|wb><|wbua><vc|wb'>*<|wb'ua>*
= Pa(b,c) +  ∑bb'≠b<vc|wb><|wbua><vc|wb'>*<|wb'ua>*
≠ Pa(b,c).

Again we encounter cross terms, leading to interference between different paths.  Just summing the probabilities for each possible path does not lead to the correct result.  All the interference effects are then missing.  When the intermediate state of the system is not determined, it is the probability amplitudes for the different intermediate states that must be summed, not the probabilities.

Why?
The fifth postulate of Quantum Mechanics states that during the measurement of B in the second experiment, the state of the system abruptly changes from |ua> to |wb>.  (The state vector contains all the information that an observer can have about the system, and that information abruptly changes.)  The measurement changes the state vector.  This change is responsible for the disappearance of the interference effects.

Conclusions
• (i)  Always square the probability amplitudes to obtain the probabilistic predictions of Quantum Mechanics.
• (ii)  When no measurement of an intermediate state is made, always sum the probability amplitudes, not the probabilities.
• (iii)  For a system in a linear superposition of states, the probability amplitude is the sum of the partial amplitude.  The probability is the square of this sum. The partial amplitudes interfere with each other.