Current microprocessors utilise the instruction-level parallelism by a deep processor pipeline and the superscalar instruction issue technique. VLSI technology offers several solutions for aggressive exploitation of the instruction-level parallelism in future generations of microprocessors. Technological advances will replace gate delay by on-chip wire delay as the main obstacle to increase the chip complexity and cycle rate. The implication for the microarchitecture is that functionally partitioned designs with strict nearest neighbour connections must be developed. Among the major problems facing the microprocessor designers is the application of even higher degree of speculation in combination with functional partitionong of the processor, which prepares the way for exceeding the classical dataflow limit imposed by data dependences. In this paper we survey the current approaches to solving this problem, in particular we analyse several new research directions whose solutions are based on the complex uniprocessor architecture. A uniprocessor chip features a very aggressive superscalar design combined with a trace cache and superspeculative techniques. Superspeculative techniques exceed the classical dataflow limit where even with unlimited machine resources a program cannot execute any faster than the execution of the longest dependence chain introduced by the program's data dependences. Superspeculative processors also speculate about control dependences. The trace cache stores the dynamic instruction traces contiguously and fetches instructions from the trace cache rather than from the instruction cache. Since a dynamic trace of instructions may contain multiple taken branches, there is no need to fetch from multiple targets, as would be necessary when predicting multiple branches and fetching 16 or 32 instructions from the instruction cache. Multiscalar and trace processors define several processing cores that speculatively execute different parts of a sequential program in parallel. Multiscalar processors use a compiler to partition the program segments, whereas a trace processor uses a trace cache to generate dynamically trace segments for the processing cores. A datascalar processor runs the same sequential program redundantly on several processing elements where each processing element has different data set. This paper discusses and compares the performance potential of these complex uniprocessors.