It is well modeled by diffusion. That's how Kimura derived his results. Joe Felsenstein sketches the math in his online book, if you want to take a look. You generally end up with the same results by coalescent methods, which is the approach I'm more familiar with.
The mean time to the most recent common ancestor for an idealized diploid population is 4N, where N is the population size. The standard deviation is also 4N, so you do have to wait quite a while. For my simulation of a constant-sized population, I burned in for 20N generations as a reasonable approximation.
For the simulations I used a mutation rate of 0.1 mutation/genome/generation, so there's no direct comparison with genomic data; you have to scale one way or another.
That's not my intuition. Fluctuations are just from binomial sampling, whose variance is symmetric around (and peaks at) f = 0.5. Unfortunately, I don't have a good intuition to replace it with. I do note that allele frequencies diffuse toward higher values until each frequency class contributes equally to ultimate fixation, but it's not obvious to me why that has to be the stationary condition.
The number of segregating SNPs plateaus. Not for modern humans, of course, since we have an immense population and we won't plateau until after the sun has consumed the earth, but for our ancestral population, the number of variant sites was more or less constant. I believe that's what's meant by steady state here.
Well, here's where it was being derived originally (eq. 7 is the relevant one, I believe).
For a constant-sized population, the shape of the distribution is independent of the population size. There may be closed-form solutions for other simple demographic histories (well, for exponential expansion, anyway), but mostly one just simulates what the frequency spectrum would look like under some scenario. There are approaches for inferring demographic history from allele frequency spectra, e.g. dadi and psmc
Since the spectrum is independent of population size, as you have noted, these approaches can only determine changes to population size; overall size and thus overall dating has to be calibrated from some other input. This can be from recombination, as @Swamidass has pointed out, or from the total number of variants -- or more realistically, from the diversity seen in pairwise comparisons, so that you don't have to sample the entire population. The mean number of differences between pairs of genomes is just equal to 4Nµ, where µ is the mutation rate. This will give you an answer with some uncertainty -- maybe as much as a factor of two -- but easily good enough to rule out recent tiny population sizes. My opinion is that diversity gives a better constraint than recombination, since the latter varies a lot across the genome and between genetically different individuals.