Repository
- https://github.com/vuejs/vue
- https://github.com/rust-lang-nursery/rust-wasm
- https://github.com/WebAssembly/binaryen
What Will I Learn?
- Mix Vue project with Rust code
- Analyzing WebAssembly code
- Using another loader to reduce
.wasm
file size
Requirements
- Basic understanding of Rust
- Basic understanding of HTML and Typescript
- Basic understanding of Vuejs and Class-Style Vue Components
- Some knowledge about webpack and/or
vue.config.js
- Install vue-cli and yarn
- Install rustup
- Install some code-editor/IDE (strongly suggest to use VSCode + Vetur)
- Web browser that can open https://webassembly.studio/
Difficulty
- Intermediate
Tutorial Contents
In my previous tutorial for mixing Vue with Rust, we able to compile Rust code into WebAssembly code with the bundle size about ~2mb and also able to reduce it about ~580kb as shown in Table 1 with the help of wasm-gc which remove all unneeded exports, imports, functions, etc. However, we can still shrink it further with help of binaryen and enable Link Time Optimization.
options | Size | Gzipped |
---|---|---|
none | 1962.12 kb | 256.52 kb |
release | 1958.73 kb | 256.00 kb |
gc | 581.94 kb | 256.00 kb |
gc + release | 578.85 kb | 91.54 kb |
In this tutorial, we will try to shrink the size of 3 wasm code that generated from 3 rust code. We will also be going to analyze the wasm code that we generated on each step with the help of webassembly.studio which has a feature to visualize the call graph of our wasm code as shown in the video below [1].
Preparation
Before we begin to compile our rust code, we need to restructure our project to be able to compile each of 3 rust code into wasm code just to make it easier to analyze and compare. The reason why we need 3 rust code is to have a comparison when we use external package/crate. First, we need to restructure our project as shown in Figure 1.
. ├── src │ ├── components │ │ └── Calculator.vue │ ├── libs │ │ ├── algebra-matrix2x2 │ │ │ ├── Cargo.toml ⬅️ `nalgebra` only specified here │ │ │ └── calculator.rs │ │ ├── arithmatic │ │ │ ├── Cargo.toml │ │ │ └── calculator.rs │ │ └── empty │ │ ├── Cargo.toml │ │ └── calculator.rs │ ├── App.vue │ └── main.ts ├── Cargo.toml ⬅️ Rust workspace config ├── package.json └── vue.config.js
1. new project structure
[workspace] members = [ "src/libs/algebra-matrix2x2", "src/libs/arithmatic", "src/libs/empty" ] [profile.release] lto = false # ⬅️ default is `false`
2. Content of
./Cargo.toml
[package] name = "folder name" version = "0.1.0" authors = ["your name <email@host.domain>"] [lib] crate-type = ["cdylib"] path = "calculator.rs" # dependencies only writte for "algebra-matrix2x2" [dependencies] nalgebra = { version = "0.15", default-features = false, features = [ "alloc" ] }
3. Content of
./libs/**/Cargo.toml
In Figure 1.1, you maybe figure it out that we will structure our rust code project in workspace mode as you may know that it has 4 Cargo.toml which one that resides in the rootspace is for initiating the workspace and also configure the build for profile release as shown in Figure 1.2. The rest are for defining the dependencies for each rust code that reside in the same folder as shown in Figure 1.3. You can define the dependencies which will be compiled as wasm code on each Cargo.toml (excluding workspace) but for this project, we only define it for algebra-matrix2x2 which use crate nalgebra. As specified in nalgebra documentation, we compile it without libstd
(by specifying default-features=false
) since at the current moment rust libstd
for target build wasm32-unknown-unknown
still not supported. Also, we enable feature alloc
because we will use type provided by nalgebra that manage heap-allocated values.
In the previous tutorial, we use wasm-gc
and default profile release for target wasm32-unknown-unknown
to reduce the bundle size. However, based on the rustwasm documentation for building Conway's Game of Life [3], we can further reduce the bundle size by:
- enable lto (right now at the nightly channel for target wasm, by default: optimize for size and abort at panic are enabled, debug option is disabled)
- use wasm-gc
- use binaryen
- use wasm-snip (not use this since our implementation are simples one. Also, it does not always work)
Since we have install wasm-gc, we just need to install binaryen. Binaryen is a compiler and toolchain infrastructure library for WebAssembly, written in C++. Most of the tools are cli app and one of them can be embedded into javascript (binaryen.js). Luckily, there is a webpack loader for binaryen that we can use in this project. To install it, run
yarn add binaryen-loader --dev
That's will install binaryen-loader in devDependencies section. Since the compilation will take huge memory RAM (because compiling ⩾3 code) and nodejs by default limit the memory usage up to 2gb, we need to expand it into more than that (in my case I just set it to 5gb) by changing yarn build
command in file package.json
.
"build": "node --max_old_space_size=5120 node_modules/@vue/cli-service/bin/vue-cli-service build",
After that, we need to change the implementation as shown in Code Change 1 to manually test the logic to make sure it's runnable.
@Prop() module!: string @Watch('module') async changeModule(mod: string) { let loadWasm if (mod === 'algebra') { loadWasm = await import(/*webpackChunkName: 'calculator.algebra'*/'@/libs/algebra-matrix2x2/calculator.rs') } else if (mod === 'arithmatic') { loadWasm = await import(/*webpackChunkName: 'calculator.arithmatic'*/'@/libs/arithmatic/calculator.rs') } else { loadWasm = await import(/*webpackChunkName: 'calculator.empty'*/'@/libs/empty/calculator.rs') } this.wasm = await loadWasm.default() } async mounted() { await this.changeModule(this.module) await this.changeOperation(this.operation) }
1. Some part of
./src/components/Calculator.vue
<select class="title" v-model="operation"> <option value="arithmatic">arithmatic</option> <option value="algebra">algebra (matrix 2x2 diagonal operation)</option> </select> <input type="range" name="x" v-model.number="x" /> <Calculator class="center" :a="x" :b="y" :operation="selected" :module="operation"> <select v-model="selected"> <option value="add">add</option> <option value="substract">substract</option> <option value="multiply">multiply</option> <option v-if="operation === 'algebra'" value="dot">dot</option> <option v-if="operation === 'algebra'" value="tensor">tensor</option> <option v-if="operation === 'arithmatic'" value="divide">divide</option> <option v-if="operation === 'arithmatic'" value="power">power</option> <option v-if="operation === 'arithmatic'" value="remainder">remainder</option> </select> </Calculator>
2. Some part of
./src/components/App.vue
In Code Change 1.1, we utilize a webpack feature called code splitting to separate wasm implementation into independent js file. We specify the outputted filename in properties webpackChunkName
and make it as an inline comment. We also utilize dynamic imports feature just to make it easier to write (not to make it fast or more responsive). In Code Change 1.2, we added a mechanism to switch between arithmatic
implementation and algebra
implementation. The result is shown in Figure 2.
Rust Code
After we have prepared the new project structure, we can begin to write Rust code that compiled into wasm code. We split it into 3 rust file because we want to know the behavior and how each optimization is going to be like. We also have empty .rs
file which doesn't contain any code just to make comparison clear as shown in Code Change 2.
// NOTHING
#[no_mangle] pub fn add(a: i32, b: i32) -> i32 { a + b } #[no_mangle] pub fn substract(a: i32, b: i32) -> i32 { a - b } #[no_mangle] pub fn multiply(a: i32, b: i32) -> i32 { a * b } #[no_mangle] pub fn divide(a: i32, b: i32) -> i32 { a / b } #[no_mangle] pub fn power(a: i32, b: i32) -> i32 { a ^ b } #[no_mangle] pub fn remainder(a: i32, b: i32) -> i32 { a % b }
extern crate nalgebra as na; use na::{DMatrix}; #[no_mangle] pub fn add(a: f32, b: f32) -> f32 { let matrix_a = DMatrix::from_diagonal_element(2, 2, a); let matrix_b = DMatrix::from_diagonal_element(2, 2, b); (matrix_a + matrix_b).determinant() } #[no_mangle] pub fn substract(a: f32, b: f32) -> f32 { let matrix_a = DMatrix::from_diagonal_element(2, 2, a); let matrix_b = DMatrix::from_diagonal_element(2, 2, b); (matrix_a - matrix_b).determinant() } #[no_mangle] pub fn multiply(a: f32, b: f32) -> f32 { let matrix_a = DMatrix::from_diagonal_element(2, 2, a); let matrix_b = DMatrix::from_diagonal_element(2, 2, b); (matrix_a * matrix_b).determinant() } #[no_mangle] pub fn dot(a: f32, b: f32) -> f32 { let matrix_a = DMatrix::from_diagonal_element(2, 2, a); let matrix_b = DMatrix::from_diagonal_element(2, 2, b); matrix_a.dot(&matrix_b) } #[no_mangle] pub fn tensor(a: f32, b: f32) -> f32 { let matrix_a = DMatrix::from_diagonal_element(2, 2, a); let matrix_b = DMatrix::from_diagonal_element(2, 2, b); matrix_a.kronecker(&matrix_b).determinant() }
Code Change 2.2 is the arithmatic function implementation which we have written before in our previous tutorial which all accept type i32 as an argument with return type also i32. In Code Change 2.3, we write some implementation of matrix 2x2 operation like dot and tensor product. Notice that all operation is returning a determinant matrix expect dot product. The reason is just to make our function return only single value with type f32. That's why for dot
function we do not convert it into determinant matrix because the result of dot product is a real number (which is machine code represented as a floating-point unit), not a matrix (or array).
Default profile.release
(Disable_LTO)
In our previous tutorial, we use the default configuration of target build release for wasm32-unknown-unknown
with the manifest:
opt-level = 's' # optimize for shrinking
debug = false
rpath = false # relative path
lto = false # disable link-time optimization
debug-assertions = false
codegen-units = 16
panic = 'abort' # abort at panic!
incremental = false # disable incremental compilation
overflow-checks = false # disable check for overflow bit (since wasm run on JS VM)
then if we compile it (`yarn build`) we will get a result shown in Table 2. For WAsm Size, you can get it after compilation is done in folder `./target/wasm32-unknown-unknown/release` at file with extension `.wasm`.
wasm32-unknown-unknown
and default profile.release
Implementation | JS Size | GZipped | WAsm Size |
---|---|---|---|
algebra | 1945.44 kb | 252.25 kb | 651 KB |
arithmatic | 1895.43 kb | 245.86 kb | 635 KB |
empty | 1893.56 kb | 245.58 kb | 634 KB |
As shown in Table 2, all file (especially JS file) become bloated, even an empty rust code take place about 1.85 MiB. If we take the wasm code of empty
and convert it into s-expression/symbolic-expression (.wat
), we will get that it has 141,772 LoC (Line of Code) as shown in Figure 3. If we use twiggy, we get that 5,295 items had a shallow size percent less than 0.1 which all of it are garbage instructions. We may also notice that 34.45% shallow bytes are "function names" subsection
which are part of name-section with the purpose to attach printable names to definitions in a module, which e.g. can be used by a debugger or when parts of the module are to be rendered in text form. These sections do not contribute to, or otherwise, affect the WebAssembly semantics, and like any custom section, they may be ignored by an implementation. However, they provide useful metadata that implementations can make use of to improve user experience or take compilation hints. Since the file is too big and the LoC is too long, we can't view the call graph in webassembly.studio 😂.
lto=false
Enable Link Time Optimization
Link-Time Optimization (LTO) allows the compiler to take all the libraries and crates into account when optimizing them, and optimize them as a single unit, rather than individually. I.e. things like inlining across crate bounds become possible. This typically increases the performance, but in some circumstances drastically increases compile time[2]. Unlike LLVM bitcode, WebAssembly code was designed for temporary on-disk serialization of the IR for link-time optimization, and not for stability or compressibility (although it does have some features for both of those)[6]. To enable LTO, we need to change the configuration for build release in ./Cargo.toml
[profile.release]
lto = true
When we run `yarn build`, we will get results as shown in Table 3. For WAsm Size, you can get it after compilation is done in folder `./target/wasm32-unknown-unknown/release` at files with extension `.wasm`.
Implementation | JS Size | GZipped | WAsm Size |
---|---|---|---|
algebra | 60.13 kb | 10.05 kb | 21 KB |
arithmatic | 4.77 kb | 1.38 kb | 1,5 KB |
empty | 0.76 kb | 0.44 kb | 125 B |
In Table 3, we see a huge reduction in size by enabling LTO. We see an interesting thing here that empty
WAsm size is not 0 byte but almost. If we take a peek at empty
, we can see something like in Figure 4.
lto=true
In Figure 4, we see that the compiler generate code to allocate the memory table even if the file calculator.rs
is empty. The confusing part is it has function rust_eh_personality
. This function is used by the failure mechanisms of the compiler. This is often mapped to specific compiler (e.g GCC) personality function, but crates which do not trigger a panic can be assured that this function is never called. The lang attribute is called eh_personality
. The function rust_eh_personality
also appeared in arithmatic and algebra. There is an interesting thing in arithmatic as shown in Figure 5.
lto=true
In Figure 5, we see that divide
and remainder
can call panic code. This makes sense if we consider that anything divided by zero is not a valid operation. In other words, calling divide(x,0)
or remainder(x,0)
will cause runtime error.
Enable wasm-gc
Although LTO can remove almost all garbage code, some of them are failed to be removed because some function code is in use at compile time. It's impossible to remove them since it's used by the failure mechanisms of the compiler, so it will not compile if they are removed. This is where wasm-gc is used to remove all unneeded exports, imports, functions, etc after the compilation. It has a hardcoded blacklist of exports where, if found, they'll forcibly not be exported from the result (and then they're naturally gc'd unless they're otherwise referenced)[4]. In this section, we will use wasm-gc after compiling our code with LTO enabled for further bundle size reduction and get the result as shown in Table 4. For more info how to install and enable wasm-gc, see my previous tutorial.
wasm-gc
after compiling with LTO enabled
Implementation | JS Size | GZipped | WAsm Size |
---|---|---|---|
algebra | 53.04 kb | 9.56 kb | 19 KB |
arithmatic | 4.25 kb | 1.28 kb | 1,3 KB |
empty | 0.53 kb | 0.37 kb | 55 B |
In Table 4, we get bundle reduction about 2~70 bytes. Not really significant but at least it helps remove some function that never been called as shown in Figure 6.
In Figure 6, we see that $rust_eh_personality
function is removed along with its custom type declaration type $t0 (func)
. We also see that unused global variable is also removed (Line 6). This behavior also is shown in algebra
when we generate the call graph as shown in Figure 7.
In Figure 7, we see that not only $rust_eh_personality
is removed, but also $memcmp
, $memset
, and $memmove
. This is because all of that function are listed in hardcoded blacklist. However, there is some exception that $memcpy
function not being removed because it used by another function.
Using Binaryen
Binaryen is a compiler and toolchain infrastructure library for WebAssembly, written in C++. It goes much further than LLVM's WebAssembly backend does. Binaryen's optimizer has many passes that can improve code very significantly. One specific area of focus is on WebAssembly-specific optimizations (those general-purpose compilers might not do), which you can think of as wasm minification, similar to minification for JavaScript, CSS, etc., all of which are language-specific[6]. In rustwasm book stated that we can get 15-20% savings on code size and often produce runtime speedups at the same time. Luckily, there is a webpack loader for Binaryen called (binaryen-loader) which under the hood use binaryen.js. Since we have installed binaryen-loader in the previous step, we now only need to enable it in vue.config.js
as shown in Code Change 3.
binaryen-loader
rules: [{ test: /\.rs$/, use: [{ loader: 'wasm-loader' }, { loader: 'binaryen-loader' }, { loader: 'rust-native-wasm-loader', options: { release: process.env.NODE_ENV === 'production', gc: process.env.NODE_ENV === 'production' } }] }]
1. add (chain) binaryen-loader between wasm-loader and rust-native-wasm-loader
2. Illustration how chaining the loader works{ optimization: { level: 2, // -O2 shrinkLevel: 1 // -Os }, transformation: { passes: [ "duplicate-function-elimination", "inlining-optimizing", "remove-unused-module-elements", "memory-packing" ] }, debug: false }
3. default options that applied based on binaryen.js docs and this line
In Code Change 3.1, we just add (chain) binaryen-loader
between wasm-loader
and rust-native-wasm-loader
since binaryen-loader
will take wasm code the spit out the minified version of it. As you see in Code Change 3.2, rust-native-wasm-loader
take the rust code which in text (UTF-8) format then compiled it to wasm code which in binary format via wasm32-unknown-unknown
target compiler then remove unwanted code/function with wasm-gc
. Under the hood, binaryen-loader
use binaryen.js which have default options stated in Code Change 3.3 and also use default passes that defined in Binaryen code base. Passes are a function that does some sort of transformation on wasm binary code. In Table 5, we see that we get further bundle size reduction by chaining into binaryen-loader.
Implementation | JS Size | GZipped | WAsm Size |
---|---|---|---|
algebra | 47.95 kb | 8.80 kb | 18 KB |
arithmatic | 3.11 kb | 1.03 kb | 953 B |
empty | 0.49 kb | 0.35 kb | 38 B |
In Table 5, we see that we are able to reduce the JS size of arithmatic code from 4.25 kb to 3.11 kb which is 26.82% reduction. We also get 17 bytes loss in empty code because (table $T0 1 1 anyfunc)
(code for dummy table initialization) is being removed by binaryen. However, we only get 9.6% (53.04 kb -> 47.95 kb) size reduction in algebra code. Actually, if we use remove-memory
and post-emscripten
like in Code Change 4, we can get the result as shown in Table 6.
{
loader: 'binaryen-loader',
options: {
transformation: {
passes: [
'post-emscripten',
'remove-memory'
]
}
}
}
Implementation | JS Size | GZipped | WAsm Size |
---|---|---|---|
algebra | 42.91 kb | 8.05 kb | 16 KB |
arithmatic | 2.07 kb | 0.84 kb | 613 B |
empty | 0.49 kb | 0.35 kb | 38 B |
In Table 6, as expected empty code doesn't change at all because nothing else to remove while arithmatic code has some significant reduction about 33.44% (3.11 kb -> 2.07 kb). While algebra code show really significant size reduction as shown in WAsm size that had lost about ~2KB. If we convert the wasm code into s-expression, we can get some interesting discovery as shown in Figure 8 and Figure 9.
remove-memory
before (left) and after (right)In Figure 8, we can see that some strings that stored on global memory using data
section are being removed. According to WebAssembly specs, data sections allow a string of bytes to be written at a given offset at instantiation time and are similar to the .data
sections in native executable formats. Since it doesn't use in any function, it's obvious that it needs to be removed.
post-emscripten
before (left) and after (right)In Figure 9, we see that emscripten can spot codes that can be simplified into one expression. According to WebAssembly specs, memory is just a large array of bytes that can grow over time. WebAssembly contains instructions like i32.load
and i32.store
for reading and writing from linear memory. Instead of declaring constant and do i32.add
operation, it's better to store it directly with an offset based on that constant value.
Conclusion
In summary, by enable Link Time Optimization, using wasm-gc to remove garbage function, and utilize Binaryen tools with the right passes
, we can reduce the bundle size about less than 10 KB / 80 kb as shown in Table 7.
Implementation | lto=false | lto=true | LTO + wasm-gc | LTO + GC + Binaryen(default) | LTO + GC + Binaryen(default + post-emscripten + remove-memory ) |
---|---|---|---|---|---|
algebra | 1945.44 kb | 60.13 kb | 53.04 kb | 47.95 kb | 42.91 kb |
arithmatic | 1895.43 kb | 4.77 kb | 4.25 kb | 3.11 kb | 2.07 kb |
empty | 1893.56 kb | 0.76 kb | 0.53 kb | 0.49 kb | 0.49 kb |
Thanks to rustacean people in Rust discord server that mention (and also share my previous tutorial to reddit since it blocked in my country 😂) it can be optimized further. Seems I know the tricks now, if anyone has questions or suggestions, feel free to comment below or mention me on any discord channel you found my username (as long as it's on appropriate channel/category and I'm online 🙂).
Curriculum
- Mix Rust Code (WebAssembly) with Vue Component #basic
- Related Tutorials:
References
- Sneak Peek at WebAssembly Studio
- Clap.rs - Tuning Your Wight Loss vs Performance
- Building Conway's Game of Life Tutorial using Rust
- wasm-gc issue#4
- wasm-intro
- binaryen-loader, binaryen.js, binaryen
Proof of Work Done
https://github.com/DrSensor/example-vue-component-rust/commits/master
compile results: https://webassembly.studio/?f=a53wrgwnhme