diff --git a/.image_mappings/benchmarking-llms-on-jetson-orin-nano.txt b/.image_mappings/benchmarking-llms-on-jetson-orin-nano.txt index 7094d6e..3cbdbc4 100644 --- a/.image_mappings/benchmarking-llms-on-jetson-orin-nano.txt +++ b/.image_mappings/benchmarking-llms-on-jetson-orin-nano.txt @@ -1,2 +1,3 @@ image-b25565d6f47e1ba4ce2deca7e161726b86df356e.png|388f43c3f800483aae5ea487e8f45922.png|387cde4274484063c4c7e1f9f37c185a image-7913a54157c2f4b8d0b7f961640a9c359b2d2a4f.png|ee04876d75d247f9b27a647462555777.png|2371421b04f856f7910dc8b46a7a6fb9 +image-79378d40267258c0d8968238cc62bd197dc894fa.png|16d64bdc9cf14b05b7c40c4718b8091b.png|ff2625e796efd7187614b6e0a8542af6 diff --git a/content/posts/benchmarking-llms-on-jetson-orin-nano.md b/content/posts/benchmarking-llms-on-jetson-orin-nano.md index 8e1a0b1..793754e 100644 --- a/content/posts/benchmarking-llms-on-jetson-orin-nano.md +++ b/content/posts/benchmarking-llms-on-jetson-orin-nano.md @@ -55,8 +55,7 @@ To understand where performance hits its ceiling, I applied roofline analysis— The roofline model works by comparing a workload's operational intensity (how many calculations you do per byte of data moved) against the device's balance point. If your operational intensity is too low, you're bottlenecked by memory bandwidth—and as we'll see, that's exactly what happens with LLM inference. -![S3 File](/images/benchmarking-llms-on-jetson-orin-nano/388f43c3f800483aae5ea487e8f45922.png) - +![S3 File](/images/benchmarking-llms-on-jetson-orin-nano/16d64bdc9cf14b05b7c40c4718b8091b.png) ## The Results: Speed and Efficiency diff --git a/static/images/benchmarking-llms-on-jetson-orin-nano/16d64bdc9cf14b05b7c40c4718b8091b.png b/static/images/benchmarking-llms-on-jetson-orin-nano/16d64bdc9cf14b05b7c40c4718b8091b.png new file mode 100644 index 0000000..8477b42 Binary files /dev/null and b/static/images/benchmarking-llms-on-jetson-orin-nano/16d64bdc9cf14b05b7c40c4718b8091b.png differ