Coverage for cc_modules/celery.py : 38%

Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
1#!/usr/bin/env python
3"""
4camcops_server/cc_modules/celery.py
6===============================================================================
8 Copyright (C) 2012-2020 Rudolf Cardinal (rudolf@pobox.com).
10 This file is part of CamCOPS.
12 CamCOPS is free software: you can redistribute it and/or modify
13 it under the terms of the GNU General Public License as published by
14 the Free Software Foundation, either version 3 of the License, or
15 (at your option) any later version.
17 CamCOPS is distributed in the hope that it will be useful,
18 but WITHOUT ANY WARRANTY; without even the implied warranty of
19 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20 GNU General Public License for more details.
22 You should have received a copy of the GNU General Public License
23 along with CamCOPS. If not, see <https://www.gnu.org/licenses/>.
25===============================================================================
27**Celery app.**
29Basic steps to set up Celery:
31- Our app will be "camcops_server.cc_modules".
32- Within that, Celery expects "celery.py", in which configuration is set up
33 by defining the ``app`` object.
34- Also, in ``__init__.py``, we should import that app. (No, scratch that; not
35 necessary.)
36- That makes ``@shared_task`` work in all other modules here.
37- Finally, here, we ask Celery to scan ``tasks.py`` to find tasks.
39Modified:
41- The ``@shared_task`` decorator doesn't offer all the options that
42 ``@app.task`` has. Let's skip ``@shared_task`` and the increased faff that
43 entails.
45The difficult part seems to be getting a broker URL in the config.
47- If we load the config here, from ``celery.py``, then if the config uses any
48 SQLAlchemy objects, it'll crash because some aren't imported.
49- A better way is to delay configuring the app.
50- But also, it is very tricky if the config uses SQLAlchemy objects; so it
51 shouldn't.
53Note also re logging:
55- The log here is configured (at times, at least) by Celery, so uses its log
56 settings. At the time of startup, that looks like plain ``print()``
57 statements.
59**In general, prefer delayed imports during actual tasks. Otherwise circular
60imports are very hard to avoid.**
62If using a separate ``celery_tasks.py`` file:
64- Import this only after celery.py, or the decorators will fail.
66- If you see this error from ``camcops_server launch_workers`` when using a
67 separate ``celery_tasks.py`` file:
69 .. code-block:: none
71 [2018-12-26 21:08:01,316: ERROR/MainProcess] Received unregistered task of type 'camcops_server.cc_modules.celery_tasks.export_to_recipient_backend'.
72 The message has been ignored and discarded.
74 Did you remember to import the module containing this task?
75 Or maybe you're using relative imports?
77 Please see
78 https://docs.celeryq.org/en/latest/internals/protocol.html
79 for more information.
81 The full contents of the message body was:
82 '[["recipient_email_rnc"], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]' (98b)
83 Traceback (most recent call last):
84 File "/home/rudolf/dev/venvs/camcops/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 558, in on_task_received
85 strategy = strategies[type_]
86 KeyError: 'camcops_server.cc_modules.celery_tasks.export_to_recipient_backend'
88 then (1) run with ``--verbose``, which will show you the list of registered
89 tasks; (2) note that everything here is absent; (3) insert a "crash" line at
90 the top of this file and re-run; (4) note what's importing this file too
91 early.
93General advice:
95- https://medium.com/@taylorhughes/three-quick-tips-from-two-years-with-celery-c05ff9d7f9eb
97Task decorator options:
99- https://docs.celeryproject.org/en/latest/reference/celery.app.task.html
100- ``bind``: makes the first argument a ``self`` parameter to manipulate the
101 task itself;
102 https://docs.celeryproject.org/en/latest/userguide/tasks.html#example
103- ``acks_late`` (for the decorator) or ``task_acks_late``: see
105 - https://docs.celeryproject.org/en/latest/userguide/configuration.html#std:setting-task_acks_late
106 - https://docs.celeryproject.org/en/latest/faq.html#faq-acks-late-vs-retry
107 - Here I am retrying on failure with exponential backoff, but not using
108 ``acks_late`` in addition.
110""" # noqa
112import logging
113import os
114from typing import Any, Dict, TYPE_CHECKING
116from cardinal_pythonlib.json.serialize import json_encode, json_decode
117from cardinal_pythonlib.logs import BraceStyleAdapter
118from celery import Celery, current_task
119from kombu.serialization import register
121# noinspection PyUnresolvedReferences
122import camcops_server.cc_modules.cc_all_models # import side effects (ensure all models registered) # noqa
124if TYPE_CHECKING:
125 from celery.app.task import Task as CeleryTask
126 from camcops_server.cc_modules.cc_export import DownloadOptions
127 from camcops_server.cc_modules.cc_request import CamcopsRequest
128 from camcops_server.cc_modules.cc_taskcollection import TaskCollection
130log = BraceStyleAdapter(logging.getLogger(__name__))
133# =============================================================================
134# Constants
135# =============================================================================
137CELERY_APP_NAME = "camcops_server.cc_modules"
138# CELERY_TASKS_MODULE = "celery_tasks"
139# ... look for "celery_tasks.py" (as opposed to the more common "tasks.py")
141CELERY_TASK_MODULE_NAME = CELERY_APP_NAME + ".celery"
143MAX_RETRIES = 10
144CELERY_SOFT_TIME_LIMIT_SEC = 300
147# =============================================================================
148# Configuration
149# =============================================================================
151register("json", json_encode, json_decode,
152 content_type='application/json',
153 content_encoding='utf-8')
156def get_celery_settings_dict() -> Dict[str, Any]:
157 log.debug("Configuring Celery")
158 from camcops_server.cc_modules.cc_config import (
159 CrontabEntry,
160 get_default_config_from_os_env,
161 ) # delayed import
162 config = get_default_config_from_os_env()
164 # -------------------------------------------------------------------------
165 # Schedule
166 # -------------------------------------------------------------------------
167 schedule = {} # type: Dict[str, Any]
169 # -------------------------------------------------------------------------
170 # User-defined schedule entries
171 # -------------------------------------------------------------------------
172 for crontab_entry in config.crontab_entries:
173 recipient_name = crontab_entry.content
174 schedule_name = f"export_to_{recipient_name}"
175 log.info("Adding regular export job {}: crontab: {}",
176 schedule_name, crontab_entry)
177 schedule[schedule_name] = {
178 "task": CELERY_TASK_MODULE_NAME + ".export_to_recipient_backend",
179 "schedule": crontab_entry.get_celery_schedule(),
180 "args": (recipient_name, ),
181 }
183 # -------------------------------------------------------------------------
184 # Housekeeping once per minute
185 # -------------------------------------------------------------------------
186 housekeeping_crontab = CrontabEntry(minute="*", content="dummy")
187 schedule["housekeeping"] = {
188 "task": CELERY_TASK_MODULE_NAME + ".housekeeping",
189 "schedule": housekeeping_crontab.get_celery_schedule(),
190 }
192 # -------------------------------------------------------------------------
193 # Final Celery settings
194 # -------------------------------------------------------------------------
195 return {
196 "beat_schedule": schedule,
197 "broker_url": config.celery_broker_url,
198 "timezone": config.schedule_timezone,
199 "task_annotations": {
200 "camcops_server.cc_modules.celery.export_task_backend": {
201 "rate_limit": config.celery_export_task_rate_limit,
202 }
203 },
204 }
207# =============================================================================
208# The Celery app
209# =============================================================================
211celery_app = Celery()
212celery_app.add_defaults(get_celery_settings_dict())
213# celery_app.autodiscover_tasks([CELERY_APP_NAME],
214# related_name=CELERY_TASKS_MODULE)
216_ = '''
218@celery_app.on_configure.connect
219def _app_on_configure(**kwargs) -> None:
220 log.critical("@celery_app.on_configure: {!r}", kwargs)
223@celery_app.on_after_configure.connect
224def _app_on_after_configure(**kwargs) -> None:
225 log.critical("@celery_app.on_after_configure: {!r}", kwargs)
227'''
230# =============================================================================
231# Test tasks
232# =============================================================================
234@celery_app.task(bind=True)
235def debug_task(self) -> None:
236 """
237 Test as follows:
239 .. code-block:: python
241 from camcops_server.cc_modules.celery import *
242 debug_task.delay()
244 and also launch workers with ``camcops_server launch_workers``.
246 For a bound task, the first (``self``) argument is the task instance; see
247 https://docs.celeryproject.org/en/latest/userguide/tasks.html#bound-tasks
249 """
250 log.info(f"self: {self!r}")
251 log.info(f"Backend: {current_task.backend}")
254@celery_app.task
255def debug_task_add(a: float, b: float) -> float:
256 """
257 Test as follows:
259 .. code-block:: python
261 from camcops_server.cc_modules.celery import *
262 debug_task_add.delay()
263 """
264 result = a + b
265 log.info("a = {}, b = {} => a + b = {}", a, b, result)
266 return result
269# =============================================================================
270# Exponential backoff
271# =============================================================================
273def backoff(attempts: int) -> int:
274 """
275 Return a backoff delay, in seconds, given a number of attempts.
277 The delay increases very rapidly with the number of attempts:
278 1, 2, 4, 8, 16, 32, ...
280 As per https://blog.balthazar-rouberol.com/celery-best-practices.
282 """
283 return 2 ** attempts
286# =============================================================================
287# Controlling tasks
288# =============================================================================
290def purge_jobs() -> None:
291 """
292 Purge all jobs from the Celery queue.
293 """
294 celery_app.control.purge()
297# =============================================================================
298# Note re request creation and context manager
299# =============================================================================
300# NOTE:
301# - You MUST use some sort of context manager to handle requests here, because
302# the normal Pyramid router [which ordinarily called the "finished" callbacks
303# via request._process_finished_callbacks()] will not be plumbed in.
304# - For debugging, use the MySQL command
305# SELECT * FROM information_schema.innodb_locks;
308# =============================================================================
309# Export tasks
310# =============================================================================
312@celery_app.task(bind=True,
313 ignore_result=True,
314 max_retries=MAX_RETRIES,
315 soft_time_limit=CELERY_SOFT_TIME_LIMIT_SEC)
316def export_task_backend(self: "CeleryTask",
317 recipient_name: str,
318 basetable: str,
319 task_pk: int) -> None:
320 """
321 This function exports a single task but does so with only simple (string,
322 integer) information, so it can be called via the Celery task queue.
324 Args:
325 self: the Celery task, :class:`celery.app.task.Task`
326 recipient_name: export recipient name (as per the config file)
327 basetable: name of the task's base table
328 task_pk: server PK of the task
329 """
330 from camcops_server.cc_modules.cc_export import export_task # delayed import # noqa
331 from camcops_server.cc_modules.cc_request import command_line_request_context # delayed import # noqa
332 from camcops_server.cc_modules.cc_taskfactory import (
333 task_factory_no_security_checks,
334 ) # delayed import
336 try:
337 with command_line_request_context() as req:
338 recipient = req.get_export_recipient(recipient_name)
339 task = task_factory_no_security_checks(req.dbsession,
340 basetable, task_pk)
341 if task is None:
342 log.error(
343 "export_task_backend for recipient {!r}: No task found "
344 "for {} {}", recipient_name, basetable, task_pk)
345 return
346 export_task(req, recipient, task)
347 except Exception as exc:
348 self.retry(countdown=backoff(self.request.retries), exc=exc)
351@celery_app.task(bind=True,
352 ignore_result=True,
353 max_retries=MAX_RETRIES,
354 soft_time_limit=CELERY_SOFT_TIME_LIMIT_SEC)
355def export_to_recipient_backend(self: "CeleryTask",
356 recipient_name: str) -> None:
357 """
358 From the backend, exports all pending tasks for a given recipient.
360 There are two ways of doing this, when we call
361 :func:`camcops_server.cc_modules.cc_export.export`. If we set
362 ``schedule_via_backend=True``, this backend job fires up a whole bunch of
363 other backend jobs, one per task to export. If we set
364 ``schedule_via_backend=False``, our current backend job does all the work.
366 Which is best?
368 - Well, keeping it to one job is a bit simpler, perhaps.
369 - But everything is locked independently so we can do the multi-job
370 version, and we may as well use all the workers available. So my thought
371 was to use ``schedule_via_backend=True``.
372 - However, that led to database deadlocks (multiple processes trying to
373 write a new ExportRecipient).
374 - With some bugfixes to equality checking and a global lock (see
375 :meth:`camcops_server.cc_modules.cc_config.CamcopsConfig.get_master_export_recipient_lockfilename`),
376 we can try again with ``True``.
377 - Yup, works nicely.
379 Args:
380 self: the Celery task, :class:`celery.app.task.Task`
381 recipient_name: export recipient name (as per the config file)
382 """
383 from camcops_server.cc_modules.cc_export import export # delayed import # noqa
384 from camcops_server.cc_modules.cc_request import command_line_request_context # delayed import # noqa
386 try:
387 with command_line_request_context() as req:
388 export(req, recipient_names=[recipient_name],
389 schedule_via_backend=True)
390 except Exception as exc:
391 self.retry(countdown=backoff(self.request.retries), exc=exc)
394@celery_app.task(bind=True,
395 ignore_result=True,
396 max_retries=MAX_RETRIES,
397 soft_time_limit=CELERY_SOFT_TIME_LIMIT_SEC)
398def email_basic_dump(self: "CeleryTask",
399 collection: "TaskCollection",
400 options: "DownloadOptions") -> None:
401 """
402 Send a research dump to the user via e-mail.
404 Args:
405 self:
406 the Celery task, :class:`celery.app.task.Task`
407 collection:
408 a
409 :class:`camcops_server.cc_modules.cc_taskcollection.TaskCollection`
410 options:
411 :class:`camcops_server.cc_modules.cc_export.DownloadOptions`
412 governing the download
413 """
414 from camcops_server.cc_modules.cc_export import make_exporter # delayed import # noqa
415 from camcops_server.cc_modules.cc_request import command_line_request_context # delayed import # noqa
417 try:
418 # Create request for a specific user, so the auditing is correct.
419 with command_line_request_context(user_id=options.user_id) as req:
420 collection.set_request(req)
421 exporter = make_exporter(
422 req=req,
423 collection=collection,
424 options=options
425 )
426 exporter.send_by_email()
428 except Exception as exc:
429 self.retry(countdown=backoff(self.request.retries), exc=exc)
432@celery_app.task(bind=True,
433 ignore_result=True,
434 max_retries=MAX_RETRIES,
435 soft_time_limit=CELERY_SOFT_TIME_LIMIT_SEC)
436def create_user_download(self: "CeleryTask",
437 collection: "TaskCollection",
438 options: "DownloadOptions") -> None:
439 """
440 Create a research dump file for the user to download later.
441 Let them know by e-mail.
443 Args:
444 self:
445 the Celery task, :class:`celery.app.task.Task`
446 collection:
447 a
448 :class:`camcops_server.cc_modules.cc_taskcollection.TaskCollection`
449 options:
450 :class:`camcops_server.cc_modules.cc_export.DownloadOptions`
451 governing the download
452 """
453 from camcops_server.cc_modules.cc_export import make_exporter # delayed import # noqa
454 from camcops_server.cc_modules.cc_request import command_line_request_context # delayed import # noqa
456 try:
457 # Create request for a specific user, so the auditing is correct.
458 with command_line_request_context(user_id=options.user_id) as req:
459 collection.set_request(req)
460 exporter = make_exporter(
461 req=req,
462 collection=collection,
463 options=options
464 )
465 exporter.create_user_download_and_email()
467 except Exception as exc:
468 self.retry(countdown=backoff(self.request.retries), exc=exc)
471# =============================================================================
472# Housekeeping
473# =============================================================================
475def delete_old_user_downloads(req: "CamcopsRequest") -> None:
476 """
477 Deletes user download files that are past their expiry time.
479 Args:
480 req: a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest`
481 """
482 from camcops_server.cc_modules.cc_export import UserDownloadFile # delayed import # noqa
484 now = req.now
485 lifetime = req.user_download_lifetime_duration
486 oldest_allowed = now - lifetime
487 log.debug(f"Deleting any user download files older than {oldest_allowed}")
488 for root, dirs, files in os.walk(req.config.user_download_dir):
489 for f in files:
490 udf = UserDownloadFile(filename=f, directory=root)
491 if udf.older_than(oldest_allowed):
492 udf.delete()
495@celery_app.task(bind=False,
496 ignore_result=True,
497 soft_time_limit=CELERY_SOFT_TIME_LIMIT_SEC)
498def housekeeping() -> None:
499 """
500 Function that is run regularly to do cleanup tasks.
502 (Remember that the ``bind`` parameter to ``@celery_app.task()`` means that
503 the first argument to the function, typically called ``self``, is the
504 Celery task. We don't need it here. See
505 https://docs.celeryproject.org/en/latest/userguide/tasks.html#bound-tasks.)
506 """
507 from camcops_server.cc_modules.cc_request import command_line_request_context # delayed import # noqa
508 from camcops_server.cc_modules.cc_session import CamcopsSession # delayed import # noqa
509 from camcops_server.cc_modules.cc_user import (
510 SecurityAccountLockout,
511 SecurityLoginFailure,
512 ) # delayed import
514 log.debug("Housekeeping!")
515 with command_line_request_context() as req:
516 # ---------------------------------------------------------------------
517 # Housekeeping tasks
518 # ---------------------------------------------------------------------
519 # We had a problem with MySQL locking here (two locks open for what
520 # appeared to be a single delete, followed by a lock timeout). Seems to
521 # be working now.
522 CamcopsSession.delete_old_sessions(req)
523 SecurityAccountLockout.delete_old_account_lockouts(req)
524 SecurityLoginFailure.clear_dummy_login_failures_if_necessary(req)
525 delete_old_user_downloads(req)