Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / IoT

zip.hpp: Zip Viewer and Extractor Library for IoT

4.98/5 (14 votes)
19 Jul 2021MIT4 min read 16.2K   186  
Browse and extract zips in constrained memory environments
Finally, a zip archive viewer and extractor for IoT devices!

Introduction

I needed to be able to extract zip files under the Arduino framework and other IoT frameworks. The problem is, after hunting, I could not find an adequate offering that would work with under constrained memory environments, much less the Arduino framework.

Building this Mess

You'll need VS Code with PlatformIO installed. You'll need a connected ESP32. You need to upload the filesystem image before the first time you build in order to set up the SPIFFS partition. You don't need to do it again unless you switch out the ESP32 you're using. You should be able to use zip.hpp/stream.hpp/bits.hpp just about anywhere in terms of platform, but I've only tested them on ESP32s.

Coding this Mess

Reading zip archives is pretty easy, if a little odd, since a lot of the relevant information you need is relative to the end of the file rather than the beginning. It makes it impossible to forward stream with it unfortunately. The other thing about it is that Huffman decoding requires being able to look backward 32kB into your output buffer (yes, the decompressed data you've already written) because it uses that data to do further decompression.

This ability to look back complicates things.

Originally, I had implemented a bufferless lookback mechanism that used a seekable output stream rather than a 32kB window in RAM, but I ran into a problem seeking backward into the Arduino File object. For some reason, it simply did not work. The other issue is this technique is a whole lot slower.

Failing that, I was going to implement an on demand decompression stream so you didn't need to extract the whole stream before beginning to view it. However, doing this requires even more memory. I may eventually implement this optionally at a later date, but doing so would increase the code size substantially.

So now after all that, I settled in on a method that takes a readable input stream and a writeable output stream to run a Huffman inflate operation, using a temporary 32kB buffer on the heap to perform it. I really hate allocating that much RAM on IoT devices, but unless I can track down and solve the issue with the Arduino File object, I am at a loss.

Due to the above, when all is said and done, you are almost guaranteed to need an attached SD reader/writer, unless you have something like an ESP32 with a SPIFFS partition in order to store the decompressed stream.

Another issue is that the Arduino File objects themselves are pretty stack heavy, so even though the zip archive structures are not, you still need to allocate most of your working objects as globals or otherwise on the heap.

Finally, if the above didn't make it clear, you need at least middle weight IoT device at least to run this code. A little ATMega2560 only has 8kB of RAM, so it's just not going to cut it, for example. I've tested this code on an ESP32 WROOM with 512kB of RAM (~ 300kB effectively available), which is much more than this needs to run, but it's my go-to platform.

Now with all of the disclaimers out of the way, let's explore a little more of what I did.

I'm terrible at math, so I went ahead and shamelessly lifted some public domain source for doing the decompression from here. That Git repo is magic by the way, so bookmark it. All the header files there are gold!

Beyond that, I just read through the zip looking for key bits to extract the archive information. Note that not all streams in the archive are actually compressed, so there's two different mechanisms for extraction behind archive_entry.extract().

Using this Mess

This library is short and sweet, which makes using it dead simple.

I'm going to just do a code dump here, which should explain it all. Keep in mind that this particular code is ESP32 specific:

C++
#include <Arduino.h>
// DID YOU UPLOAD THE FILESYSTEM IMAGE YET?
#include <SPIFFS.h>
#include <stream.hpp>
#include <zip.hpp>
using namespace io; // for streams
using namespace zip; // for zips

// if we try to declare all this
// on the stack we run into problems
// apparently the File class is pretty
// heavy:
char path[1024];
File f;
File f2;
archive arch;
archive_entry entry;

void setup() {
  Serial.begin(115200);
  Serial.println();

  SPIFFS.begin(false);
  f=SPIFFS.open("/frankenstein.epub","rb");
 if(!f) {
    Serial.println("File not found");
    while(true);
  }
  file_stream fs(f);
  if(zip_result::success!=archive::open(&fs,&arch)) {
    Serial.println("Zip load failed.");
    while(true);
  }
  Serial.print("Number of files ");
  Serial.println(arch.entries_size());
  
  arch.entry(11,&entry);
  
  if(SPIFFS.exists("/tmp.htm")) {
    SPIFFS.remove("/tmp.htm");
  }

  f2 = SPIFFS.open("/tmp.htm","wb");
  
  file_stream fs2(f2);
  Serial.print("extracting ");
  entry.copy_path(path,1024);
  Serial.print(path);
  Serial.println("...");
  zip_result rr=entry.extract(&fs2);
  if(zip_result::success!=rr) {
    Serial.print("extraction failed ");
    Serial.println((int)rr);
    while(true);
  }
  Serial.println("extraction complete");
  f.close();
  f2.close();
  f=SPIFFS.open("/tmp.htm","rb");
  if(!f) {
    Serial.println("Temp file not found");
    while(true);
  }
  while(true) {
    int i = f.read();
    if(0>i) break;
    Serial.write(i);
  }
  f.close();
  
}

void loop() {
}

The only thing that might be confusing is the wrapping of the File objects with a file_stream. Essentially, the zip library doesn't know about Arduino Files but it knows about io::streams. This is so the library can remain cross platform. The reason it doesn't use std::iostream<> template classes is because the STL isn't always fully available on every platform.

The other thing to take note of is that we copy_path() to get the path out of the archive entry. My IoT libraries are loath to dynamically allocate memory on the heap unless it's absolutely necessary. It usually tries to leave the memory allocation to you, and that's exactly what it's doing here, so using it is like using say, sprintf() in that it takes a buffer and a maximum size.

Don't expect it to be fast. Zips weren't really designed for IoT devices. Still, when you need it, you really need it.

Note that this code probably won't be maintained in the future as a standalone library. I'm rolling it into my user interface framework called UIX which will be released soon, and focusing on that codebase.

History

  • 20th July, 2021 - Initial submission

License

This article, along with any associated source code and files, is licensed under The MIT License