📚 Auto-publish: Add/update 3 blog posts
All checks were successful
Hugo Publish CI / build-and-deploy (push) Successful in 11s
All checks were successful
Hugo Publish CI / build-and-deploy (push) Successful in 11s
Generated on: Sat Oct 4 17:44:47 UTC 2025 Source: md-personal repository
This commit is contained in:
@@ -1,2 +1,3 @@
|
|||||||
image-b25565d6f47e1ba4ce2deca7e161726b86df356e.png|388f43c3f800483aae5ea487e8f45922.png|387cde4274484063c4c7e1f9f37c185a
|
image-b25565d6f47e1ba4ce2deca7e161726b86df356e.png|388f43c3f800483aae5ea487e8f45922.png|387cde4274484063c4c7e1f9f37c185a
|
||||||
image-7913a54157c2f4b8d0b7f961640a9c359b2d2a4f.png|ee04876d75d247f9b27a647462555777.png|2371421b04f856f7910dc8b46a7a6fb9
|
image-7913a54157c2f4b8d0b7f961640a9c359b2d2a4f.png|ee04876d75d247f9b27a647462555777.png|2371421b04f856f7910dc8b46a7a6fb9
|
||||||
|
image-79378d40267258c0d8968238cc62bd197dc894fa.png|16d64bdc9cf14b05b7c40c4718b8091b.png|ff2625e796efd7187614b6e0a8542af6
|
||||||
|
|||||||
@@ -55,8 +55,7 @@ To understand where performance hits its ceiling, I applied roofline analysis—
|
|||||||
|
|
||||||
The roofline model works by comparing a workload's operational intensity (how many calculations you do per byte of data moved) against the device's balance point. If your operational intensity is too low, you're bottlenecked by memory bandwidth—and as we'll see, that's exactly what happens with LLM inference.
|
The roofline model works by comparing a workload's operational intensity (how many calculations you do per byte of data moved) against the device's balance point. If your operational intensity is too low, you're bottlenecked by memory bandwidth—and as we'll see, that's exactly what happens with LLM inference.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
|
||||||
## The Results: Speed and Efficiency
|
## The Results: Speed and Efficiency
|
||||||
|
|
||||||
|
|||||||
Binary file not shown.
|
After Width: | Height: | Size: 694 KiB |
Reference in New Issue
Block a user