Actually trisetup rate is a fundamental characteristic of a rasterizer.
Ever since Fermi, the trisetup has been carried in the PolyMorph (tm) (r) (c) engine, which traditionally has been coupled to a SM - a PolyMorph engine per SM. Now (and here comes the tricky part), every PolyMorph engine can do the trisetup for the pertinent SM to carry out its
part of a triangle. So there's the implicit notion that a triangle may (and often will) span multiple SMs. And indeed, a triangle entity needs to pass some other stages before it even gets to the trisetup stage (carried by the PolyMorph), and there happens a sort of a bottleneck - a triangle can be dispatched to one or more GPC (Graphics Processing Cluster), based on its screen footprint, but a GPC does
one tri per clock. So if a GTX1080 has 4 GPCs, that translates to 4tri/clock, even though it has as many as 5 SMs per GPC = 20 SMs, each with a PolyMorph engine, each capable of a trisetup per clock. Alas. The entire issue stems from this 'a tri shall span multiple GPCs' notion - and it is technically (and most importantly - statistically) true - tris more often than not are big enough to span multiple screen tiles - the workload unit of the GPC/SMs within a GPC. Here's some very informative
read on the subject. Now, since Tegras have traditionally had just one GPC, that means they're traditionally 1 tri/clock.