.. py:currentmodule:: multipart
.. _HTML5: https://html.spec.whatwg.org/multipage/form-control-infrastructure.html#multipart-form-data
.. _RFC7578: https://www.rfc-editor.org/rfc/rfc7578
.. _WSGI: https://peps.python.org/pep-3333
.. _ASGI: https://asgi.readthedocs.io/en/latest/
.. _SansIO: https://sans-io.readthedocs.io/
.. _asyncio: https://docs.python.org/3/library/asyncio.html
==================
Usage and Examples
==================
Here are some basic examples for the most common use cases. There are more
parameters and features available than shown here, so check out the :doc:`api` to get a full picture.
.. _push-example:
SansIO Parser
=============
All parsers in this library are based on :class:`PushMultipartParser`, a fast,
secure, non-blocking and incremental parser targeted at framework or
application developers that need a high level of control. SansIO_ means that
the parser itself does not make any assumptions about the IO or concurrency model
and can be used in any environment, including coroutines, greenlets or callback-based
protocol handlers. But it also means that you have to deal with IO yourself.
Here is a low-level example how the parser loop may look like in an `asyncio`
based environment:
.. code-block:: python
import asyncio
import multipart
async def process_multipart(reader: asyncio.StreamReader, boundary: str):
with multipart.PushMultipartParser(boundary) as parser:
while not parser.closed:
# Wait for data
chunk = await reader.read(1024*64)
# Process a single chunk of incoming data
for event in parser.parse(chunk):
if isinstance(event, multipart.MultipartSegment):
print(f"Start of part with name: {event.name}")
print(f"Headers: {event.headerlist}")
if current.filename:
print(f"Form file upload with filename: {current.filename}")
else:
print("Form text field without a filename")
elif event: # Non-empty batearray
print(f"Received {len(event)} bytes of data")
else: # None
print("End of part")
Here is how it works: Once the parser is set up, you wait for a chunk of data
from your client and call :meth:`PushMultipartParser.parse`. The returned
iterator yields zero or more *parser events* and stops as soon as the parser
needs more data or detects the end of the multipart stream. You must fully
consume this event iterator before parsing the next chunk of data.
**Parser Events:** For each multipart segment the parser will emit a single
instance of :class:`MultipartSegment` with header and meta information, followed
by zero or more non-empty :class:`bytearray` instances with chunks from the
segment body, followed by a single :data:`None` event to signal the end of
the current segment.
Once the end of the multipart stream is reached and the last event was emitted,
:attr:`closed` will be true. Any errors or exceeded limits during parsing will
raise :exc:`MultipartError` from the iterator.
Note that the parser is used a context manager here. Closing the parser is
important to detect missing or incomplete parts caused by a premature end
of input. You can also close the parser by passing in an empty chunk of data or
calling :meth:`PushMultipartParser.close` explicitly.
Dealing with IO
---------------
The :meth:`parse() ` method does not know how to fetch more
data, it just stops yielding events and waits for you to call it again with more
data. This makes the parser loop more complicated than it needs to be in most situations.
If you can provide a blocking or async function that returns the next chunk, then
you can reduce the parser loop complexity a bit and switch to
:meth:`parse_blocking() ` or
:meth:`parse_async() `, depending on your
environment.
Here is a simplified parser loop that reads from a *blocking* stream like a
socket or ``environ[wsgi.input]``:
.. code-block:: python
import multipart, io
def blocking_example(stream: io.BufferedIOBase, boundary: str):
with multipart.PushMultipartParser(boundary) as parser:
for event in parser.parse_blocking(stream.read):
pass # Handle parser events
And here is the same loop with an `awaitable` read function:
.. code-block:: python
import multipart, asyncio
async def async_example(stream: asyncio.StreamReader, boundary: str):
with multipart.PushMultipartParser(boundary) as parser:
async for event in parser.parse_async(stream.read):
pass # Handle parser events
.. _stream-example:
Buffered Parser
===============
The :class:`MultipartParser` parser is the lazy blocking cousin of
:class:`PushMultipartParser`. It can read from a blocking byte stream (e.g.
``environ["wsgi.input"]``) and emits :class:`MultipartPart` instances that are
either memory- or disk-buffered debending on size.
The main benefit is that you no longer have to assemble the payload chunks of
each segment yourself. It is still a streaming parser, which means you can start
processing the first completed :class:`MultipartPart` instances while the client
still sends more data.
Here is a basic example for a typical WSGI_ application:
.. code-block:: python
from multipart import parse_options_header, MultipartParser
def wsgi(environ, start_response):
content_type, options = parse_options_header(environ["CONTENT_TYPE"])
if content_type == "multipart/form-data" and 'boundary' in options:
stream = environ["wsgi.input"]
boundary = options["boundary"]
parser = MultipartParser(stream, boundary)
for part in parser:
if part.filename:
print(f"{part.name}: File upload ({part.size} bytes)")
part.save_as(...)
elif part.size < 1024:
print(f"{part.name}: Text field ({part.value!r})")
else:
print(f"{part.name}: Test field, but too big to print :/")
# Free up resources after use
for part in parser.parts():
part.close()
Results are cached, so you can iterate or call :meth:`MultipartParser.get` or
:meth:`MultipartParser.parts` multiple times without triggering any extra work.
Do not forget to :meth:`close() ` all parts after use to
remove unused temporary files quicker and avoid :exc:`ResourceWarning`.
Framework developers may want to add hooks to automatically frees up resources
after the request ended.
.. _wsgi-example:
WSGI Helper
===========
The WSGI helper functions :func:`is_form_request` and :func:`parse_form_data`
accept a `WSGI environ` dictionary and support both types of form submission
(``multipart/form-data`` and ``application/x-www-form-urlencoded``) at the same
time and with the same API. You'll get two fully populated :class:`MultiDict`
instances in return, one for text fields and the other for file uploads. All
from a single parser function.
.. code-block:: python
from multipart import parse_form_data, is_form_request
def wsgi(environ, start_response):
if is_form_request(environ):
forms, files = parse_form_data(environ)
title = forms["title"] # type: string
upload = files["upload"] # type: MultipartPart
upload.save_as(...)
Note that form fields that are too large to fit into memory count as file uploads.
They will end up as :class:`MultipartPart` instances without a
:attr:`filename ` in the `files` dict instead of `forms`.
This is to protect your app from running out of memory or crashing.
:class:`MultipartPart` instances are buffered to temporary files on disk if they
exceed a certain size. The default limits should be fine for most use cases, but
can be configured if you need to. See :class:`MultipartParser` for configurable
limits.
Flask, Bottle & Co
==================
Most WSGI web frameworks already have multipart functionality built in, but
you may still get better throughput for large files (or better limits control
and security) by switching to a more advanced parser library:
.. code-block:: python
import flask
environ = flask.request.environ # or bottle.request.environ
forms, files = multipart.parse_form_data(environ)
Legacy CGI
==========
If you are in the unfortunate position to have to rely on CGI, but can't use
:class:`cgi.FieldStorage` anymore, it's possible to build a minimal WSGI environment
from a CGI environment and use that with :func:`parse_form_data`. This is not a real
WSGI environment, but it contains enough information for :func:`parse_form_data`
to do its job. Do not forget to add proper error handling.
.. code-block:: python
import sys, os, multipart
environ = dict(os.environ.items())
environ['wsgi.input'] = sys.stdin.buffer
forms, files = multipart.parse_form_data(environ)