Building App Sizer: A Developer’s Journey from Problem to Production Tool
How I turned a team frustration into an open-source Android app size analysis tool
Introduction
In my other blog post, I shared how our team at Grab tackled Android app size optimization at scale, achieving a 26% reduction in our app download size. The centerpiece of that journey was App Sizer - an open-source tool that provides detailed insights into APK composition and helps developers identify size reduction opportunities.
App Sizer is now open source and available on GitHub: github.com/grab/app-sizer
But how exactly did I build this tool? What technical challenges did I face, and what design decisions shaped the final product? In this post, I’ll take you through my engineering journey - from the initial naive attempts to the production-ready, open-source tool that teams across the industry now use.
The Problem That Started It All
As I detailed in my previous blog post about app size optimization at scale, I faced a critical challenge while working with our team at Grab: understanding what was driving our Android app’s growing size. While existing tools like Android Studio’s APK Analyzer provided basic insights, they couldn’t answer the fundamental questions I needed for effective optimization:
- Detailed size breakdown - How much comes from our codebase vs libraries?
- Size contribution by teams - Which teams should prioritize optimization?
- Module-wise size contribution - Which modules are the biggest contributors?
- Size contribution by libraries - Are external dependencies driving our size?
- List of large files - What specific files should we investigate?
In 2021, no tool in the Android community could provide this level of attribution. This blog post focuses on the technical journey of building App Sizer to fill that gap.
With the problem clearly defined and no existing solutions available, I set out to build the tool myself. This is the story of that technical journey.
The Discovery Journey
The core insight came to me early: if I could parse both the APK contents and the source artifacts (AAR/JAR files), I could map them together to determine what contributes to the final app size.
The general idea was straightforward:
- Parse AAR & JAR files from modules and libraries to understand what classes and resources they contain
- Parse APK files to extract all classes and files with their actual download sizes
- Map them together - connect APK components back to their source modules
- Calculate size distribution - attribute each byte in the APK to its origin
Parsing AAR and JAR files seemed simple enough - they’re just ZIP archives, straightforward to extract and analyze. The real challenge would be on the APK side.
In practice, it turned out to be significantly more complex than expected.
First Attempt: “How Hard Can APK Parsing Be?”
Like many developers, I started with overconfidence. “APK files are just ZIP archives, right? I’ll just parse them myself!”
This approach quickly revealed its challenges:
- DEX files required special handling for class-level analysis
- R8/ProGuard mapping added another layer of complexity
The biggest challenge was parsing DEX files to extract class-level details. DEX (Dalvik Executable) files have a complex binary format that requires deep understanding of:
DEX File Structure:
┌─────────────────┐
│ Header │ ← Magic numbers, checksums, offsets
├─────────────────┤
│ String IDs │ ← String table references
├─────────────────┤
│ Type IDs │ ← Type descriptors
├─────────────────┤
│ Proto IDs │ ← Method prototypes
├─────────────────┤
│ Field IDs │ ← Field references
├─────────────────┤
│ Method IDs │ ← Method references
├─────────────────┤
│ Class Defs │ ← Class definitions (what I needed!)
├─────────────────┤
│ Data Section │ ← Actual bytecode and data
└─────────────────┘
Each class definition contains size information buried deep in the binary format. Worse yet, when R8/ProGuard is enabled, class names are obfuscated, requiring parsing of mapping files to restore original names:
# R8 mapping file format
com.example.MyClass -> a.b.c:
void methodName() -> a
int fieldName -> b
After hours of wrestling with binary offsets, string tables, and mapping file parsing, I realized I was reinventing a very complex wheel.
Standing on Giants’ Shoulders
Then it hit me: Android Studio already does this perfectly.
The APK Analyzer in Android Studio provides exactly the breakdown we needed:
- Raw file sizes vs. download sizes
- Class-level analysis
- R8 mapping support
And the best part? Android Studio is open source
Instead of building parsing logic from scratch, I could leverage Google’s battle-tested implementation. Diving into the Android tooling source code, I found the exact components I needed:
- DEX parsing:
DexBackedClassDefandDexBackedDexFilefrom theorg.smali:dexlib2library for extracting class information from DEX files - Size calculation:
ApkSizeCalculatorandGzipSizeCalculatorfor calculating both raw and download sizes - Deobfuscation:
shadow.bundletool.com.android.tools.proguard.ProguardMapfor handling R8/ProGuard mapping files
With these proven components, I could focus on the unique value proposition: the mapping logic.
With the right components identified and a clear architectural vision, it was time to translate this concept into working code. Here’s how I structured the implementation.
Implementation Deep Dive
The Architecture Emerges
The insight was to create a mapping tool that connects APK components to their source modules:
APK Components + Module/library Binaries + Project Metadata = Detailed Size Attribution
I structured the core engine around three main stages, each with clear responsibilities:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ INPUT FILES │ │ PARSING │ │ STRUCTURED │
│ │ │ (Stage 1) │ │ DATA │
│ • APK files │───▶│ │───▶│ │
│ • AAR files │ │ • ApkParser │ │ • ApkFileInfo │
│ • JAR files │ │ • AarParser │ │ • AarFileInfo │
│ • Mapping files │ │ • JarParser │ │ • JarFileInfo │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ REPORTS │ │ ANALYSIS │ │ MAPPING │
│ │ │ (Stage 3) │ │ (Stage 2) │
│ • Team sizes │◀───│ │◀───│ │
│ • Module sizes │ │ • ApkAnalyzer │ │ • ClassMapper │
│ • Library sizes │ │ • ModuleAnalyzer│ │ • ResourceMapper│
│ • Large files │ │ • LibAnalyzer │ │ • AssetMapper │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Stage 1: Parsing (parser/ package)
The first stage extracts structured data from all input files. Each file type has its own specialized parser that understands the specific format and extracts relevant information:
// Each file type has its own specialized parser interface
interface ApkFileParser {
fun parseApks(apks: Sequence<File>, proguardMap: ProguardMap): Set<ApkFileInfo>
}
interface AarFileParser {
fun parseAars(files: Sequence<SizerInputFile>): Set<AarFileInfo>
}
interface JarFileParser {
fun parseJars(files: Sequence<SizerInputFile>): Set<JarFileInfo>
}
View the implementations: ApkFileParser, AarFileParser, JarFileParser
Each parser extracts different information based on the file format:
- ApkFileParser: Uses
ApkSizeCalculatorto extract classes (via DEX parsing), resources, assets, and native libraries with both raw and download sizes. Handles ProGuard deobfuscation. - AarFileParser: Parses AAR files (Android libraries) to extract resources, assets, native libs, and embedded JAR files. Since AARs are ZIP files, this is relatively straightforward.
- JarFileParser: Extracts classes and native libraries from JAR files. Like AAR parsing, leverages standard ZIP file handling.
The APK parsing is the most complex, leveraging the Android tooling components I mentioned earlier to handle DEX files and size calculations accurately.
Stage 2: Mapping (analyzer/mapper/ package)
The second stage connects APK components back to their source modules:
interface ComponentMapper {
fun Set<ApkFileInfo>.mapTo(
aars: Set<AarFileInfo>,
jars: Set<JarFileInfo>
): ComponentMapperResult
}
View the interface: ComponentMapper
I implemented specialized mappers for each component type, each handling unique challenges:
ClassComponentMapper: The most complex mapper
- Maps Java/Kotlin classes from APK DEX files to their source AAR/JAR files
- Deals with auto-generated lambda classes (
-$$Lambda$) - Manages synthetic classes created by the compiler
ResourceComponentMapper: Handles Android resources
- Maps resources with version-specific directories (
drawable-v22/) - Handles special characters in resource names (
$bg_network_error__0.xml) - Accounts for renamed resources during the build process
AssetComponentMapper: Straightforward asset mapping
- Direct path matching between APK assets and AAR assets
NativeLibComponentMapper: Handles native libraries
- Normalizes path differences between APK (
/lib/armeabi-v7a/) and AAR (/jni/armeabi-v7a/) - Maps .so files to their source modules
Each mapper returns unmatched components as “no owner” data, which gets attributed to the app module as a fallback.
Stage 3: Analysis (analyzer/ package)
The final stage transforms mapped data into actionable reports. Each analyzer focuses on a specific aspect of the size analysis:
interface Analyzer {
fun process(): Report
}
View the interface: Analyzer
I implemented several specialized analyzers to answer different questions:
ApkAnalyzer: Provides the high-level breakdown
- Separates codebase vs library contributions
- Breaks down by component type:
codebase-kotlin-java,codebase-resources,codebase-assets,android-java-libraries,native-libraries - Calculates the “Others” category for unmatched components
ModuleAnalyzer: Shows module-wise contributions
- Maps contributors to project modules
- Integrates team ownership information
- Handles the special “app” module for unmatched components
LibrariesAnalyzer: Focuses on external dependencies
- Analyzes third-party library contributions
- Helps identify heavy dependencies that could be optimized
LargeFileAnalyzer: Identifies optimization opportunities
- Finds files above a configured size threshold
- Useful for discovering unexpectedly large assets or resources
LibContentAnalyzer: Deep-dive into specific libraries
- Shows what’s inside a particular library dependency
- Helpful for understanding why a library is taking up space
The beauty of this analyzer pattern is extensibility - adding new report types requires only implementing the Analyzer interface, without touching the core parsing or mapping logic.
The three-stage architecture solved the core technical challenges, but I also needed to think about how developers would actually use this tool in practice.
Building for the Future: CLI First, Plugin Ready
When I started building App Sizer, time was limited on the Bonsai project. I needed a working solution quickly, so I focused on a CLI tool first. But I had a vision: eventually, this should be available as a Gradle plugin for seamless Android project integration.
This vision shaped my architectural decisions right at the beginning. Instead of building a monolithic CLI tool, I designed the core engine with abstraction in mind, knowing I’d need to support different interfaces later:
- Immediate need: CLI for our immediate use
- Future vision: Gradle plugin for the broader Android community
The challenge was building the right abstractions without over-engineering. I needed something that worked now but could evolve later.
Clean Separation of Concerns
I solved this by designing the core analysis engine (app-sizer module) to be completely interface-agnostic. The core logic only knows about two contracts:
// Core interfaces that abstract away the client details
interface InputProvider {
fun provideModuleAar(): Sequence<SizerInputFile>
fun provideModuleJar(): Sequence<SizerInputFile>
fun provideLibraryJar(): Sequence<SizerInputFile>
fun provideLibraryAar(): Sequence<SizerInputFile>
fun provideApkFiles(): Sequence<File>
fun provideR8MappingFile(): File?
fun provideTeamMappingFile(): File?
fun provideLargeFileThreshold(): Long
}
interface OutputProvider {
fun provideInfluxDbConfig(): InfluxDBConfig?
fun provideOutPutDirectory(): File
fun provideProjectInfo(): ProjectInfo
fun provideCustomProperties(): CustomProperties
}
View the actual interfaces: InputProvider and OutputProvider
The beauty of this design is that the core engine doesn’t care whether the inputs come from:
- YAML configuration files (CLI)
- Gradle project introspection (Plugin)
- Environment variables, databases, or any future interface
Implementation in Practice
Here’s how each interface implements these contracts:
CLI Implementation:
class AnalyzerCommand : CliktCommand() {
override fun run() {
val config = ConfigYmlLoader().load(settingFile)
DefaultApkGenerator.create(config)
.generate(config.apkGeneration.deviceSpecs)
.forEach { apkDirectory ->
AppSizer(
inputProvider = CliInputProvider(
fileQuery = DefaultFileQuery(),
config = config,
apksDirectory = apkDirectory
),
outputProvider = CliOutputProvider(config, apkDirectory.nameWithoutExtension),
libName = libName,
logger = CliLogger()
).process(reportOption)
}
}
}
View the CLI implementation: AnalyzerCommand, CliInputProvider, CliOutputProvider
The CLI loads YAML configuration and creates providers that handle file system scanning and directory-based artifact discovery.
Gradle Plugin Implementation:
@TaskAction
fun run() {
apkDirectories.forEach { apkDirectory ->
val projectInfo = ProjectInfo(
projectName = project.rootProject.name,
versionName = variantInput.get().versionName ?: "NA",
deviceName = apkDirectory.nameWithoutExtension,
buildType = variantInput.get().name
)
val archiveDependencyStore = ArchiveDependencyManager()
.readFromJsonFile(archiveDepJsonFile.asFile.get())
AppSizer(
inputProvider = PluginInputProvider(
archiveDependencyStore = archiveDependencyStore,
r8MappingFile = r8MappingFile.orNull?.asFile,
apksDirectory = apkDirectory,
largeFileThreshold = largeFileThreshold.get(),
teamMappingFile = teamMappingFile.orNull?.asFile
),
outputProvider = PluginOutputProvider(
influxDBConfig = influxDBConfig.orNull,
projectInfo = projectInfo,
customProperties = customProperties.get(),
outputFolder = outputDirectory.asFile.get()
),
libName = libName.orNull,
logger = PluginLogger(project)
).process(option.get())
}
}
View the Gradle plugin implementation: AppSizeAnalysisTask, PluginInputProvider, PluginOutputProvider
The Gradle plugin automatically discovers dependencies through Gradle’s APIs and creates providers that leverage the project’s existing build configuration.
The Payoff: Smooth Evolution
This forward-thinking architecture paid off when I later built the Gradle plugin. Instead of rewriting the core logic, I only needed to:
- Create new provider implementations that worked with Gradle’s project model
- Design a native Gradle DSL for configuration
- Integrate with Gradle’s task system for proper dependency management
The core analysis engine remained unchanged - exactly as planned.
Benefits of This Architecture
- Avoid Code Duplication: All parsing, mapping, and analysis logic exists once in the core module
- Easy Testing: Core logic can be tested independently of interface concerns
- Future-Proof: Adding new interfaces (Maven plugin, IDE extension, etc.) requires no changes to the core engine
Areas for Improvement
While App Sizer has proven valuable in production, there are several areas where it could be enhanced:
Performance Optimization: Currently, multiple parts of the process can run in parallel, such as parsing independent files or running different types of analysis concurrently. But this hasn’t been implemented yet.
Known Limitations: Like any tool, App Sizer has constraints and edge cases. We maintain a comprehensive list of known limitations in our documentation, covering scenarios where the analysis might not be entirely accurate or complete.
Build System Integration: While our Gradle plugin works well, it doesn’t yet support Gradle’s configuration cache, which can significantly speed up build times. Adding this support would make App Sizer more seamless in modern Android build pipelines.
These improvements represent natural evolution points for the tool, and contributions in these areas would be particularly welcome from the community.
Conclusion
Building App Sizer taught me that sometimes the best engineering solutions come from smart composition rather than starting from scratch. By leveraging Android Studio’s battle-tested parsing logic and focusing on the unique challenge of attribution mapping, I was able to create a tool that provides insights no other solution offered in 2021.
Real-World Impact
Three mobile projects in my company have adopted App Sizer. Two via the CLI and one via the Gradle plugin. This validates the interface abstraction strategy: the same core engine serves different workflows through flexible integration options.
Since open-sourcing the project last year, the response has been encouraging. With ~200 stars on GitHub and growing adoption across the Android community, I hope App Sizer is helping teams beyond Grab optimize their app sizes and understand their build composition.
If you’re facing similar challenges with Android app size analysis, I encourage you to try App Sizer. Whether you need the CLI for custom build systems or the Gradle plugin for seamless Android integration, the tool is ready to help you understand exactly what’s contributing to your app’s size.
The project is open source and welcomes contributions. After all, the best developer tools are built by the community, for the community.
This blog post was written with the assistance of Claude Code to speed up the writing process.
Comments