It is defined as the expected, not actual, reproductive success of the trait in question. (Either the absolute success, or success relative to an alternative trait – one can define both absolute and relative fitness.)
Measuring fitness is quite a different thing than fitness itself, and you seem to be confusing them. The fitness of a trait is how reproductively successful it would be if you could generate identical populations many times and measure its average success. Any particular history of reproductive success provides an estimate – a measurement – of the fitness; it isn’t the fitness itself. In the case of a trait at an appreciable frequency in a very large population, like a lab flask of bacteria, the actual history provides a very good estimate of the real fitness. For a new mutation or for a small population, the actual history provides a very noisy measurement.
I’m pretty sure that’s false and it’s certainly irrelevant. False, because I think most possible traits are unconditionally deleterious or (often) nonviable. Irrelevant because fitness is defined on a particular genetic background and in a particular environment. It’s perfectly well understood that the same trait may be beneficial in one environment or on one genetic background and deleterious in another situation.
Again, you’re confusing the fitness itself with accurate measurement of the fitness. Just because I sometimes have no good way of measuring something doesn’t make that something a scientifically useless concept. According to statistical mechanics, the temperature of the air around me is a measure of the average kinetic energy of all of the gas molecules in it. Calculating that average would seem to require a complete knowledge of the velocity of each and every one of septillions of molecules, which is utterly impossible. And yet “momentum” continues to be useful scientific concept anyway, as does the microscopic definition of temperature.