Skylake: Deconstructing Intel's 6th generation Core processor
Skylake: Deconstructing Intel's 6th generation Core processor
Inside Intel Skylake
So Intel's 6th generation Core microprocessor family (Skylake) has arrived, but what’s exactly new? In this article, we delve deeper to scrutinize its architecture and capabilities.
Advancing the manufacturing process technology node to 14nm (compared to Haswell's 22nm) can only net selective improvements in clock frequencies to scale performance and power savings depending on where the silicon is deployed. But to truly make an impact, evolving the microarchitecture is essential and that's what Skylake has focused upon.
This is why the Skylake architecture and the 6th generation Core processors will be a far more impactful CPU advancement than Broadwell would ever be as the latter has maintained much of Haswell's microarchitecture, while relying on the processor die shrink for much of its advances and offerings. As such, Broadwell was focused upon the mobile computing scene to help bring about more efficient offerings while the desktop scene relied on refreshed Haswell lineup and the likes of Devil's Canyon to hold off till Skylake arrived.
Eking out more performance out of the Core
First off, Skylake focused on increasing core performance output with a bigger, wider core. This was achieved by increased instruction window size with deeper buffers and improved brand predictors to improve parallelism in core execution. The execution units themselves have been tweaked for lower latencies, improved power management such as shutting down when not in use and there are more execution units than before. With security matters growing to be an ever more concern, Intel also accelerated processing of AES encryption - particularly that of AES-GCM and AES-CBC modes, by 17% and 33% respectively.
To tackle all of the above, Intel also made bandwidth improvements such as doubling its load and store buffer size to 128 bytes per cycle, improved the prefetchers and page miss handling, better L2 cache miss bandwidth and cache management as a whole. It doesn't stop there, as even the last level cache (LLC, which in the case of Skylake, its L3 cache) and ring bus architecture that enables data sharing among all of the cores and the integrated graphics engine has been scrutinized to improve cache miss handling to double the throughput with only 50% more fabric (hardware). As such, it can operate at lower frequencies to maintain similar throughput to save power. Coupled with increased processor frequency, better Hyper-Threading performance and a new integrated memory controller that works with DDR3L and speedier DDR4-2133 memory, all these serve to increase the processor's bandwidth to handle a higher performing and wider core.
For higher performing processors that come with embedded DRAM (eDRAM) on the CPU package to accelerate integrated graphics performance, Intel has modified the eDRAM topology such that it now operates as a memory side cache. It's now fully coherent, caches any data type and can function somewhat like an L4 cache, but one that's usable directly by I/O devices and the display engine. We'll share a few more notes on eDRAM availability when we discuss on Skylake's graphics engine or also otherwise known as Intel's Gen.9 graphics architecture (more on that later).
Clamping down on power consumption and increasing power efficiency
Next on the agenda for Skylake is power efficiency. To achieve this, Intel took a bottom up design approach for the processor architecture with a focus on power reduction. Fine grain power gating internally is observed through many areas of the processor such as the Core, system agent, graphics engine, platform I/O and more through implementing additional Intel Speed Step technology domains that optimize performance and power draw for low bandwidth usage. Skylake is also more closely designed to support scenario based power consumption (such as during media playback) and focus on power draw during low utilization with idle power reduction and reduced C1 state power draw. For notebooks, the display panel is usually one of the highest power consuming components of the system and there have been improvements in display controller management. For example, moving from a 2,560 x 1,440 pixels resolution panel to 3,200 x 1,800 pixels resolution panel incurs roughly 1.6x more bandwidth , but the power draw required is now only 1.2x more.
For the very first time, Skylake's microarchitecture integrates a digital PLL for massive power savings across the chip. Skylake chips also operate at a lower Vmin, but are able to maintain similar clock speeds as its predecessors, so that further helps reduce overall power draw. These power savings will also mean there's more active power budget to rely upon where really compute intensive CPU or GPU tasks are required to be executed, which is especially useful in power limited form factors such as fanless 2-in-1 convertible notebook designs.
On a related note, Skylake will now offer configurable power management options for OEMs to dictate skin temperature configuration of their device and even individual voltage regulator controls for setting battery current drive limitations and more. What this means is OEMs will have better control on high junction temperature workloads where smoother frequency scaling and lower throttling can be expected. This is useful when OEMs spend notable time optimizing their platform based on internal testing to deliver the best performing computing platform possible for a particular set of design parameters. Devices like notebooks and tablets can be constructed from different materials that will each have its own set of heat emitting properties, while the human touch has different level of tolerances based on the design and build of a product. Couple that with operational climate differences across continents, configurable power management is a step in the right direction to deliver a better consumer product.
But wait, there's more to the power-performance efficiency angle in Skylake and the next one is interesting. Enter the new Intel Speed Shift technology. Instead of the operating system traditionally mediating between the power control unit (PCU) on the processor to determine power input to be applied based on workloads, the 6th generation Core processor now communicates directly to the PCU, thereby reducing a typical 30ms input delay to just 1ms. This enables the processor to ramp processing throughput/clocks up and down swifter (extremely fast P-state changes) and thus completing its tasks faster and more efficiently.
To illustrate the benefit in another context, imagine two racing cars that have to complete a circuit track, but only one has Speed Shift technology. Where a long distance sprint is concerned (akin to a CPU that's busy rendering a video), both sports cars would complete that part of the lap at a similar time. However, it gets tricky with turns and bends as the vehicles will have to travel much slower to negotiate them (akin to a pause or a reduction of workload where the processor steps through a different power state as it sees fit). This is where Intel's Speed Shift technology promises a much more rapid change in shifting gears up/down and transferring power to the drive train fast. In the long run, the racing car equipped with Speed Shift will complete the lap swifter than a competing car without this technology. This technology will only be present in the Core i-series of Skylake processors.
The expected end-user experience out of Intel Speed Shift technology is a more responsive system, speed-up of resume and the likes.
New security technologies
The new Skylake core supports Intel Software Guard Extensions (Intel SGX) technology that brings it the following advantages:-
- Supports new instructions and flows to create and isolate enclaves from malware and privileged software attacks
- Enables usage of trusted memory regions (trusted enclaves)
Further to that, Skylake implements Intel Memory Protection Extensions (Intel MPX), which are new instructions to enable runtime buffer flow checks to ensure the physical memory access falls within the bounds of the memory allocated to the calling process.
To sum up SGX and MPX features in the core, if the software run fails these checks, it will fail to execute.
The overclocking edge
For enthusiasts, you can expect increased overclocking capabilities due to a microprocessor design change to exclude the fully integrated voltage regulator (FIVR) in each CPU core that was present in Haswell and Broadwell chips. The FIVR probably made sense for a mobility-first CPU design choice, but for overclocking friendly desktop processors, that works against it with increased heat build-up and finicky base clock controls. All these duties are now back to the motherboard, so you can probably expect a slight increase to mobo prices. On the upside, you can now look forward to a finer grained tuning experience as you ramp up your motherboard base clock and memory clocks to hit new highs for your overclocking attempts. Full range BCLK tuning is now possible with the removal of the FIVR - this is applicable to both desktop and mobile unlocked processors.
You read that right - mobile K SKU processors are now a reality with Skylake and that's a first for a mobile processor. Among the long list of new mobile processors that Intel is launching today, at this point of time there is only one such SKU on the mobile platform – the Core i7-6820HK. This 45W TDP processor is a true quad-core part with Hyper-Threading technology, 8MB of L3 cache, a base clock of 2.7GHz and a default maximum quad-core Turbo rated for 3.2GHz. Check out how effortlessly you can overclock it in this live demo for the media who attended Intel’s IDF 2015:-
Note that overclocking freedom will only be made available to K-SKU processors and not for all Skylake processors. Here's a summary slide to sum up to what extents a typical Skylake system can be tweaked:-
For more on Intel, follow us here.