Usage and Examples¶

Here are some basic examples for the most common use cases. There are more parameters and features available than shown here, so check out the API Reference to get a full picture.

SansIO Parser¶

All parsers in this library are based on PushMultipartParser, a fast, secure, non-blocking and incremental parser targeted at framework or application developers that need a high level of control. SansIO means that the parser itself does not make any assumptions about the IO or concurrency model and can be used in any environment, including coroutines, greenlets or callback-based protocol handlers. But it also means that you have to deal with IO yourself.

Here is a low-level example of how the parser loop may look in an asyncio based environment:

import asyncio
import multipart

async def process_multipart(reader: asyncio.StreamReader, boundary: str):

  with multipart.PushMultipartParser(boundary) as parser:

    while not parser.closed:

      # Wait for data
      chunk = await reader.read(1024*64)

      # Process a single chunk of incoming data
      for event in parser.parse(chunk):

        if isinstance(event, multipart.MultipartSegment):
          print(f"Start of part with name: {event.name}")
          print(f"Headers: {event.headerlist}")

          if event.filename:
            print(f"Form file upload with filename: {event.filename}")
          else:
            print("Form text field without a filename")

        elif event:  # Non-empty bytes
          print(f"Received {len(event)} bytes of data")

        else:  # None
          print("End of part")

Here is how it works: Once the parser is set up, you wait for a chunk of data from your client and call PushMultipartParser.parse(). The returned iterator yields zero or more parser events and stops as soon as the parser needs more data or detects the end of the multipart stream. You must fully consume this event iterator before parsing the next chunk of data.

Parser Events: For each multipart segment the parser will emit a single instance of MultipartSegment with header and meta information, followed by zero or more non-empty bytes instances with chunks from the segment body, followed by a single None event to signal the end of the current segment.

Once the end of the multipart stream is reached and the last event was emitted, closed will be true. Any errors or exceeded limits during parsing will raise MultipartError from the iterator.

Note that the parser is used as a context manager here. Closing the parser is important to detect missing or incomplete parts caused by a premature end of input. You can also close the parser by passing in an empty chunk of data or calling PushMultipartParser.close() explicitly.

Dealing with IO¶

The parse() method does not know how to fetch more data. It just stops yielding events and waits for you to call it again with the next chunk. This low-level mode of operation is very flexible, but sometimes more complicated than it needs to be.

If you can provide some blocking or async function that returns the next chunk when called, then you can skip some of the complexity of parse() and use parse_blocking() or parse_async() instead.

Here is what the parser loop may look like in a blocking environment. Instead of an abstract blocking stream you could also read from a socket or environ[wsgi.input]:

import multipart, io

def blocking_example(stream: io.BufferedIOBase, boundary: str):
  with multipart.PushMultipartParser(boundary) as parser:
    for event in parser.parse_blocking(stream.read):
      pass  # Handle parser events

And here is the same loop with an awaitable stream:

import multipart, asyncio

async def async_example(stream: asyncio.StreamReader, boundary: str):
  with multipart.PushMultipartParser(boundary) as parser:
    async for event in parser.parse_async(stream.read):
      pass  # Handle parser events

Buffered Parser¶

The MultipartParser parser is the lazy blocking cousin of PushMultipartParser. It can read from a blocking byte stream (e.g. environ["wsgi.input"]) and emits MultipartPart instances that are either memory- or disk-buffered depending on size.

The main benefit is that you no longer have to assemble the payload chunks of each segment yourself. It is still a streaming parser, which means you can start processing the first completed MultipartPart instances while the client still sends more data.

Here is a basic example for a typical WSGI application:

from multipart import parse_options_header, MultipartParser

def wsgi(environ, start_response):
  content_type, options = parse_options_header(environ["CONTENT_TYPE"])

  if content_type == "multipart/form-data" and 'boundary' in options:
    stream = environ["wsgi.input"]
    boundary = options["boundary"]
    parser = MultipartParser(stream, boundary)

    for part in parser:
      if part.filename:
        print(f"{part.name}: File upload ({part.size} bytes)")
        part.save_as(...)
      elif part.size < 1024:
        print(f"{part.name}: Text field ({part.value!r})")
      else:
        print(f"{part.name}: Text field, but too big to print :/")

    # Free up resources after use
    for part in parser.parts():
      part.close()

Results are cached, so you can iterate or call MultipartParser.get() or MultipartParser.parts() multiple times without triggering any extra work.

Do not forget to close() all parts after use to remove unused temporary files quicker and avoid ResourceWarning. Framework developers may want to add hooks to automatically free up resources after the request ended.

WSGI Helper¶

The WSGI helper functions is_form_request() and parse_form_data() accept a WSGI environ dictionary and support both types of form submission (multipart/form-data and application/x-www-form-urlencoded) at the same time and with the same API. You’ll get two fully populated MultiDict instances in return, one for text fields and the other for file uploads. All from a single parser function.

from multipart import parse_form_data, is_form_request

def wsgi(environ, start_response):
  if is_form_request(environ):
    forms, files = parse_form_data(environ)

    title  = forms["title"]   # type: string
    upload = files["upload"]  # type: MultipartPart
    upload.save_as(...)

Note that form fields that are too large to fit into memory count as file uploads. They will end up as MultipartPart instances without a filename in the files dict instead of forms. This is to protect your app from running out of memory or crashing.

MultipartPart instances are buffered to temporary files on disk if they exceed a certain size. The default limits should be fine for most use cases, but can be configured if you need to. See MultipartParser for configurable limits.

Flask, Bottle & Co¶

Most WSGI web frameworks already have multipart functionality built in, but you may still get better throughput for large files (or better limits control and security) by switching to a more advanced parser library:

import flask
environ = flask.request.environ  # or bottle.request.environ
forms, files = multipart.parse_form_data(environ)

Legacy CGI¶

If you are in the unfortunate position to have to rely on CGI, but can’t use cgi.FieldStorage anymore, it’s possible to build a minimal WSGI environment from a CGI environment and use that with parse_form_data(). This is not a real WSGI environment, but it contains enough information for parse_form_data() to do its job. Do not forget to add proper error handling.

import sys, os, multipart

environ = dict(os.environ.items())
environ['wsgi.input'] = sys.stdin.buffer
forms, files = multipart.parse_form_data(environ)

Multipart

Navigation

Related Topics