How to know that file was fully downloaded or not with Python and Nginx

Updated on September 21, 2021
Read — 3 minutes

Introduction

I have been working at Go Wombat as Python Developer & Machine Learning Engineer. Once I got an interesting task on my project: to find out if a file was completely downloaded from a server and write information about this event in the database by using API. I started looking for a solution and found one on StackOverflow. It suggested using the Nginx server. For API service, I used Django and Django Rest Framework. After starting implementing the described solution I realized it had many disadvantages. I’m gonna show my solution to the task.

First of all, you need to create a few microservices:

  • web (for managing API requests);
  • nginx (as you understand it will be an Nginx web-server);
  • logger (a part responsible for gathering and sending logs to web-part)

Also, it is good practice to use docker containers for this purpose.

Nginx part

Now it is time to write the nginx.conf file. We will store all the information about a request, so specify the name of the label, in our case, ‘download’. Then, describe the format of your future log file: an IP, time of a request, a request, a status code, a number of bytes which were sent, and the name of the file. To know more, check Nginx documentationTraffic is the main parameter because, in the API, you must compare its value with the actual file size and find out if the file was downloaded completely (a little trick).

log_format download '{"remote_addr" : "$remote_addr",'
       ' "time":"$time_local",'
       ' "request":"$request", '
       ' "status":"$status",  '
       ' "traffic":$body_bytes_sent, '
       ' "uri": "$uri" }';

/media is a directory in the Django app which stores files, so we add an alias to get files from it, that’s why you need to add a proxy_pass to the web application. After this, just add a location for the log file.

location /media {
     proxy_pass http://web:8000;
     default_type application/octet-stream;
     alias /web/media/;
     access_log /var/log/nginx/download.checker.log download;
}

Everything else is like in an ordinary Nginx config. Full the config:

worker_processes auto;

events {}

http {

  log_format download '{"remote_addr" : "$remote_addr",'
       ' "time":"$time_local",'
       ' "request":"$request", '
       ' "status":"$status",  '
       ' "traffic":$body_bytes_sent, '
       ' "uri": "$uri" }';

  server {
    listen 80; # the port your site will be served on
    charset     utf-8;

    client_max_body_size 700M;  # max upload size

    # configs for the Django app
    location / {
      proxy_pass  http://web:8000;
      proxy_redirect  off;
      proxy_set_header Host $host;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }

    location /media {
      proxy_pass http://web:8000;
      default_type application/octet-stream;
      alias /web/media/;
      access_log /var/log/nginx/download.checker.log download;
    }
  }
}

In the same directory, add the simple Dockerfile for the Nginx:

FROM nginx

WORKDIR /nginx_logs
COPY nginx.conf /etc/nginx

EXPOSE 80

logger part

Now, the turn of the logger service. Create in the root the logger directory, add to it logs directory, and download.py file inside. You need to read data in real-time from the log file. Use the Popen class from the subprocess python module:

log = subprocess.Popen(
    ['tail', '-F', 'download.checker.log'],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE
)

while True:
    send_logs(log)

send_logs function sends data about a request to a web application, that’s why you need to install the requests module for this. All example is uploaded below:

import subprocess
import json
import requests
import logging


def send_stats(log: subprocess.Popen):
    data = json.loads(log.stdout.readline(), encoding="utf-8")
    payload = {
        "traffic": data["traffic"],
        "file": data["uri"].split("/")[-1],
        # you can add your fields
    }

    try:
        response = requests.post(
           url="http://web:8000/api/v1/stats",
           json=payload
        )
        response.raise_for_status()
    except Exception as e:
        logging.error(e)
    else:
        logging.info(f"Log was sent to web API {payload}")


if __name__ == "__main__":
    log = subprocess.Popen(
        ['tail', '-F', 'download.checker.log'],
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE
    )

    while True:
        send_stats(log)

And docker file. Nothing ordinary here, but it deserves special attention because it uses the slim variant of Python image, which saves significant resources.

FROM python:3.8-slim-buster

WORKDIR /logs
RUN pip install requests
COPY /logs /logs

CMD ["python", "download.py"]

docker-compose file

In the final step, specify a docker-compose file with shared volume. Why do you need it? It is simple. Because your log file is in the Nginx container and logic for sending data in the other. So, create a special volume of shared_logs and connect containers.

version: '3.7'
services:

  web:
    # your application with API

  nginx:
    build:
      context: ./nginx
    links:
      - web
    volumes:
      - "shared_logs:/var/log/nginx/"
    ports:
      - "80:80"

  logger:
    build:
      context: ./logger
    links:
      - web
    volumes:
      - "shared_logs:/logs"

volumes:
  shared_logs:

Conclusion

As you see, it is a simple task, but it may take a long time if you don’t know where to start.

Read our recent article and find out how to use SSH tunnel to expose a local server to the Internet.

Updated on September 21, 2021
Read — 3 minutes

How can we help you?