Hide keyboard shortcuts

Hot-keys on this page

r m x p   toggle line displays

j k   next/prev highlighted chunk

0   (zero) top of page

1   (one) first highlighted chunk

1#!/usr/bin/env python 

2 

3""" 

4camcops_server/cc_modules/celery.py 

5 

6=============================================================================== 

7 

8 Copyright (C) 2012-2020 Rudolf Cardinal (rudolf@pobox.com). 

9 

10 This file is part of CamCOPS. 

11 

12 CamCOPS is free software: you can redistribute it and/or modify 

13 it under the terms of the GNU General Public License as published by 

14 the Free Software Foundation, either version 3 of the License, or 

15 (at your option) any later version. 

16 

17 CamCOPS is distributed in the hope that it will be useful, 

18 but WITHOUT ANY WARRANTY; without even the implied warranty of 

19 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 

20 GNU General Public License for more details. 

21 

22 You should have received a copy of the GNU General Public License 

23 along with CamCOPS. If not, see <https://www.gnu.org/licenses/>. 

24 

25=============================================================================== 

26 

27**Celery app.** 

28 

29Basic steps to set up Celery: 

30 

31- Our app will be "camcops_server.cc_modules". 

32- Within that, Celery expects "celery.py", in which configuration is set up 

33 by defining the ``app`` object. 

34- Also, in ``__init__.py``, we should import that app. (No, scratch that; not 

35 necessary.) 

36- That makes ``@shared_task`` work in all other modules here. 

37- Finally, here, we ask Celery to scan ``tasks.py`` to find tasks. 

38 

39Modified: 

40 

41- The ``@shared_task`` decorator doesn't offer all the options that 

42 ``@app.task`` has. Let's skip ``@shared_task`` and the increased faff that 

43 entails. 

44 

45The difficult part seems to be getting a broker URL in the config. 

46 

47- If we load the config here, from ``celery.py``, then if the config uses any 

48 SQLAlchemy objects, it'll crash because some aren't imported. 

49- A better way is to delay configuring the app. 

50- But also, it is very tricky if the config uses SQLAlchemy objects; so it 

51 shouldn't. 

52 

53Note also re logging: 

54 

55- The log here is configured (at times, at least) by Celery, so uses its log 

56 settings. At the time of startup, that looks like plain ``print()`` 

57 statements. 

58 

59**In general, prefer delayed imports during actual tasks. Otherwise circular 

60imports are very hard to avoid.** 

61 

62If using a separate ``celery_tasks.py`` file: 

63 

64- Import this only after celery.py, or the decorators will fail. 

65 

66- If you see this error from ``camcops_server launch_workers`` when using a 

67 separate ``celery_tasks.py`` file: 

68 

69 .. code-block:: none 

70 

71 [2018-12-26 21:08:01,316: ERROR/MainProcess] Received unregistered task of type 'camcops_server.cc_modules.celery_tasks.export_to_recipient_backend'. 

72 The message has been ignored and discarded. 

73 

74 Did you remember to import the module containing this task? 

75 Or maybe you're using relative imports? 

76 

77 Please see 

78 https://docs.celeryq.org/en/latest/internals/protocol.html 

79 for more information. 

80 

81 The full contents of the message body was: 

82 '[["recipient_email_rnc"], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]' (98b) 

83 Traceback (most recent call last): 

84 File "/home/rudolf/dev/venvs/camcops/lib/python3.6/site-packages/celery/worker/consumer/consumer.py", line 558, in on_task_received 

85 strategy = strategies[type_] 

86 KeyError: 'camcops_server.cc_modules.celery_tasks.export_to_recipient_backend' 

87 

88 then (1) run with ``--verbose``, which will show you the list of registered 

89 tasks; (2) note that everything here is absent; (3) insert a "crash" line at 

90 the top of this file and re-run; (4) note what's importing this file too 

91 early. 

92 

93General advice: 

94 

95- https://medium.com/@taylorhughes/three-quick-tips-from-two-years-with-celery-c05ff9d7f9eb 

96 

97Task decorator options: 

98 

99- https://docs.celeryproject.org/en/latest/reference/celery.app.task.html 

100- ``bind``: makes the first argument a ``self`` parameter to manipulate the 

101 task itself; 

102 https://docs.celeryproject.org/en/latest/userguide/tasks.html#example 

103- ``acks_late`` (for the decorator) or ``task_acks_late``: see 

104 

105 - https://docs.celeryproject.org/en/latest/userguide/configuration.html#std:setting-task_acks_late 

106 - https://docs.celeryproject.org/en/latest/faq.html#faq-acks-late-vs-retry 

107 - Here I am retrying on failure with exponential backoff, but not using 

108 ``acks_late`` in addition. 

109 

110""" # noqa 

111 

112import logging 

113import os 

114from typing import Any, Dict, TYPE_CHECKING 

115 

116from cardinal_pythonlib.json.serialize import json_encode, json_decode 

117from cardinal_pythonlib.logs import BraceStyleAdapter 

118from celery import Celery, current_task 

119from kombu.serialization import register 

120 

121# noinspection PyUnresolvedReferences 

122import camcops_server.cc_modules.cc_all_models # import side effects (ensure all models registered) # noqa 

123 

124if TYPE_CHECKING: 

125 from celery.app.task import Task as CeleryTask 

126 from camcops_server.cc_modules.cc_export import DownloadOptions 

127 from camcops_server.cc_modules.cc_request import CamcopsRequest 

128 from camcops_server.cc_modules.cc_taskcollection import TaskCollection 

129 

130log = BraceStyleAdapter(logging.getLogger(__name__)) 

131 

132 

133# ============================================================================= 

134# Constants 

135# ============================================================================= 

136 

137CELERY_APP_NAME = "camcops_server.cc_modules" 

138# CELERY_TASKS_MODULE = "celery_tasks" 

139# ... look for "celery_tasks.py" (as opposed to the more common "tasks.py") 

140 

141CELERY_TASK_MODULE_NAME = CELERY_APP_NAME + ".celery" 

142 

143MAX_RETRIES = 10 

144CELERY_SOFT_TIME_LIMIT_SEC = 300 

145 

146 

147# ============================================================================= 

148# Configuration 

149# ============================================================================= 

150 

151register("json", json_encode, json_decode, 

152 content_type='application/json', 

153 content_encoding='utf-8') 

154 

155 

156def get_celery_settings_dict() -> Dict[str, Any]: 

157 log.debug("Configuring Celery") 

158 from camcops_server.cc_modules.cc_config import ( 

159 CrontabEntry, 

160 get_default_config_from_os_env, 

161 ) # delayed import 

162 config = get_default_config_from_os_env() 

163 

164 # ------------------------------------------------------------------------- 

165 # Schedule 

166 # ------------------------------------------------------------------------- 

167 schedule = {} # type: Dict[str, Any] 

168 

169 # ------------------------------------------------------------------------- 

170 # User-defined schedule entries 

171 # ------------------------------------------------------------------------- 

172 for crontab_entry in config.crontab_entries: 

173 recipient_name = crontab_entry.content 

174 schedule_name = f"export_to_{recipient_name}" 

175 log.info("Adding regular export job {}: crontab: {}", 

176 schedule_name, crontab_entry) 

177 schedule[schedule_name] = { 

178 "task": CELERY_TASK_MODULE_NAME + ".export_to_recipient_backend", 

179 "schedule": crontab_entry.get_celery_schedule(), 

180 "args": (recipient_name, ), 

181 } 

182 

183 # ------------------------------------------------------------------------- 

184 # Housekeeping once per minute 

185 # ------------------------------------------------------------------------- 

186 housekeeping_crontab = CrontabEntry(minute="*", content="dummy") 

187 schedule["housekeeping"] = { 

188 "task": CELERY_TASK_MODULE_NAME + ".housekeeping", 

189 "schedule": housekeeping_crontab.get_celery_schedule(), 

190 } 

191 

192 # ------------------------------------------------------------------------- 

193 # Final Celery settings 

194 # ------------------------------------------------------------------------- 

195 return { 

196 "beat_schedule": schedule, 

197 "broker_url": config.celery_broker_url, 

198 "timezone": config.schedule_timezone, 

199 "task_annotations": { 

200 "camcops_server.cc_modules.celery.export_task_backend": { 

201 "rate_limit": config.celery_export_task_rate_limit, 

202 } 

203 }, 

204 } 

205 

206 

207# ============================================================================= 

208# The Celery app 

209# ============================================================================= 

210 

211celery_app = Celery() 

212celery_app.add_defaults(get_celery_settings_dict()) 

213# celery_app.autodiscover_tasks([CELERY_APP_NAME], 

214# related_name=CELERY_TASKS_MODULE) 

215 

216_ = ''' 

217 

218@celery_app.on_configure.connect 

219def _app_on_configure(**kwargs) -> None: 

220 log.critical("@celery_app.on_configure: {!r}", kwargs) 

221 

222 

223@celery_app.on_after_configure.connect 

224def _app_on_after_configure(**kwargs) -> None: 

225 log.critical("@celery_app.on_after_configure: {!r}", kwargs) 

226 

227''' 

228 

229 

230# ============================================================================= 

231# Test tasks 

232# ============================================================================= 

233 

234@celery_app.task(bind=True) 

235def debug_task(self) -> None: 

236 """ 

237 Test as follows: 

238 

239 .. code-block:: python 

240 

241 from camcops_server.cc_modules.celery import * 

242 debug_task.delay() 

243 

244 and also launch workers with ``camcops_server launch_workers``. 

245 

246 For a bound task, the first (``self``) argument is the task instance; see 

247 https://docs.celeryproject.org/en/latest/userguide/tasks.html#bound-tasks 

248 

249 """ 

250 log.info(f"self: {self!r}") 

251 log.info(f"Backend: {current_task.backend}") 

252 

253 

254@celery_app.task 

255def debug_task_add(a: float, b: float) -> float: 

256 """ 

257 Test as follows: 

258 

259 .. code-block:: python 

260 

261 from camcops_server.cc_modules.celery import * 

262 debug_task_add.delay() 

263 """ 

264 result = a + b 

265 log.info("a = {}, b = {} => a + b = {}", a, b, result) 

266 return result 

267 

268 

269# ============================================================================= 

270# Exponential backoff 

271# ============================================================================= 

272 

273def backoff(attempts: int) -> int: 

274 """ 

275 Return a backoff delay, in seconds, given a number of attempts. 

276 

277 The delay increases very rapidly with the number of attempts: 

278 1, 2, 4, 8, 16, 32, ... 

279 

280 As per https://blog.balthazar-rouberol.com/celery-best-practices. 

281 

282 """ 

283 return 2 ** attempts 

284 

285 

286# ============================================================================= 

287# Controlling tasks 

288# ============================================================================= 

289 

290def purge_jobs() -> None: 

291 """ 

292 Purge all jobs from the Celery queue. 

293 """ 

294 celery_app.control.purge() 

295 

296 

297# ============================================================================= 

298# Note re request creation and context manager 

299# ============================================================================= 

300# NOTE: 

301# - You MUST use some sort of context manager to handle requests here, because 

302# the normal Pyramid router [which ordinarily called the "finished" callbacks 

303# via request._process_finished_callbacks()] will not be plumbed in. 

304# - For debugging, use the MySQL command 

305# SELECT * FROM information_schema.innodb_locks; 

306 

307 

308# ============================================================================= 

309# Export tasks 

310# ============================================================================= 

311 

312@celery_app.task(bind=True, 

313 ignore_result=True, 

314 max_retries=MAX_RETRIES, 

315 soft_time_limit=CELERY_SOFT_TIME_LIMIT_SEC) 

316def export_task_backend(self: "CeleryTask", 

317 recipient_name: str, 

318 basetable: str, 

319 task_pk: int) -> None: 

320 """ 

321 This function exports a single task but does so with only simple (string, 

322 integer) information, so it can be called via the Celery task queue. 

323 

324 Args: 

325 self: the Celery task, :class:`celery.app.task.Task` 

326 recipient_name: export recipient name (as per the config file) 

327 basetable: name of the task's base table 

328 task_pk: server PK of the task 

329 """ 

330 from camcops_server.cc_modules.cc_export import export_task # delayed import # noqa 

331 from camcops_server.cc_modules.cc_request import command_line_request_context # delayed import # noqa 

332 from camcops_server.cc_modules.cc_taskfactory import ( 

333 task_factory_no_security_checks, 

334 ) # delayed import 

335 

336 try: 

337 with command_line_request_context() as req: 

338 recipient = req.get_export_recipient(recipient_name) 

339 task = task_factory_no_security_checks(req.dbsession, 

340 basetable, task_pk) 

341 if task is None: 

342 log.error( 

343 "export_task_backend for recipient {!r}: No task found " 

344 "for {} {}", recipient_name, basetable, task_pk) 

345 return 

346 export_task(req, recipient, task) 

347 except Exception as exc: 

348 self.retry(countdown=backoff(self.request.retries), exc=exc) 

349 

350 

351@celery_app.task(bind=True, 

352 ignore_result=True, 

353 max_retries=MAX_RETRIES, 

354 soft_time_limit=CELERY_SOFT_TIME_LIMIT_SEC) 

355def export_to_recipient_backend(self: "CeleryTask", 

356 recipient_name: str) -> None: 

357 """ 

358 From the backend, exports all pending tasks for a given recipient. 

359 

360 There are two ways of doing this, when we call 

361 :func:`camcops_server.cc_modules.cc_export.export`. If we set 

362 ``schedule_via_backend=True``, this backend job fires up a whole bunch of 

363 other backend jobs, one per task to export. If we set 

364 ``schedule_via_backend=False``, our current backend job does all the work. 

365 

366 Which is best? 

367 

368 - Well, keeping it to one job is a bit simpler, perhaps. 

369 - But everything is locked independently so we can do the multi-job 

370 version, and we may as well use all the workers available. So my thought 

371 was to use ``schedule_via_backend=True``. 

372 - However, that led to database deadlocks (multiple processes trying to 

373 write a new ExportRecipient). 

374 - With some bugfixes to equality checking and a global lock (see 

375 :meth:`camcops_server.cc_modules.cc_config.CamcopsConfig.get_master_export_recipient_lockfilename`), 

376 we can try again with ``True``. 

377 - Yup, works nicely. 

378 

379 Args: 

380 self: the Celery task, :class:`celery.app.task.Task` 

381 recipient_name: export recipient name (as per the config file) 

382 """ 

383 from camcops_server.cc_modules.cc_export import export # delayed import # noqa 

384 from camcops_server.cc_modules.cc_request import command_line_request_context # delayed import # noqa 

385 

386 try: 

387 with command_line_request_context() as req: 

388 export(req, recipient_names=[recipient_name], 

389 schedule_via_backend=True) 

390 except Exception as exc: 

391 self.retry(countdown=backoff(self.request.retries), exc=exc) 

392 

393 

394@celery_app.task(bind=True, 

395 ignore_result=True, 

396 max_retries=MAX_RETRIES, 

397 soft_time_limit=CELERY_SOFT_TIME_LIMIT_SEC) 

398def email_basic_dump(self: "CeleryTask", 

399 collection: "TaskCollection", 

400 options: "DownloadOptions") -> None: 

401 """ 

402 Send a research dump to the user via e-mail. 

403 

404 Args: 

405 self: 

406 the Celery task, :class:`celery.app.task.Task` 

407 collection: 

408 a 

409 :class:`camcops_server.cc_modules.cc_taskcollection.TaskCollection` 

410 options: 

411 :class:`camcops_server.cc_modules.cc_export.DownloadOptions` 

412 governing the download 

413 """ 

414 from camcops_server.cc_modules.cc_export import make_exporter # delayed import # noqa 

415 from camcops_server.cc_modules.cc_request import command_line_request_context # delayed import # noqa 

416 

417 try: 

418 # Create request for a specific user, so the auditing is correct. 

419 with command_line_request_context(user_id=options.user_id) as req: 

420 collection.set_request(req) 

421 exporter = make_exporter( 

422 req=req, 

423 collection=collection, 

424 options=options 

425 ) 

426 exporter.send_by_email() 

427 

428 except Exception as exc: 

429 self.retry(countdown=backoff(self.request.retries), exc=exc) 

430 

431 

432@celery_app.task(bind=True, 

433 ignore_result=True, 

434 max_retries=MAX_RETRIES, 

435 soft_time_limit=CELERY_SOFT_TIME_LIMIT_SEC) 

436def create_user_download(self: "CeleryTask", 

437 collection: "TaskCollection", 

438 options: "DownloadOptions") -> None: 

439 """ 

440 Create a research dump file for the user to download later. 

441 Let them know by e-mail. 

442 

443 Args: 

444 self: 

445 the Celery task, :class:`celery.app.task.Task` 

446 collection: 

447 a 

448 :class:`camcops_server.cc_modules.cc_taskcollection.TaskCollection` 

449 options: 

450 :class:`camcops_server.cc_modules.cc_export.DownloadOptions` 

451 governing the download 

452 """ 

453 from camcops_server.cc_modules.cc_export import make_exporter # delayed import # noqa 

454 from camcops_server.cc_modules.cc_request import command_line_request_context # delayed import # noqa 

455 

456 try: 

457 # Create request for a specific user, so the auditing is correct. 

458 with command_line_request_context(user_id=options.user_id) as req: 

459 collection.set_request(req) 

460 exporter = make_exporter( 

461 req=req, 

462 collection=collection, 

463 options=options 

464 ) 

465 exporter.create_user_download_and_email() 

466 

467 except Exception as exc: 

468 self.retry(countdown=backoff(self.request.retries), exc=exc) 

469 

470 

471# ============================================================================= 

472# Housekeeping 

473# ============================================================================= 

474 

475def delete_old_user_downloads(req: "CamcopsRequest") -> None: 

476 """ 

477 Deletes user download files that are past their expiry time. 

478 

479 Args: 

480 req: a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

481 """ 

482 from camcops_server.cc_modules.cc_export import UserDownloadFile # delayed import # noqa 

483 

484 now = req.now 

485 lifetime = req.user_download_lifetime_duration 

486 oldest_allowed = now - lifetime 

487 log.debug(f"Deleting any user download files older than {oldest_allowed}") 

488 for root, dirs, files in os.walk(req.config.user_download_dir): 

489 for f in files: 

490 udf = UserDownloadFile(filename=f, directory=root) 

491 if udf.older_than(oldest_allowed): 

492 udf.delete() 

493 

494 

495@celery_app.task(bind=False, 

496 ignore_result=True, 

497 soft_time_limit=CELERY_SOFT_TIME_LIMIT_SEC) 

498def housekeeping() -> None: 

499 """ 

500 Function that is run regularly to do cleanup tasks. 

501 

502 (Remember that the ``bind`` parameter to ``@celery_app.task()`` means that 

503 the first argument to the function, typically called ``self``, is the 

504 Celery task. We don't need it here. See 

505 https://docs.celeryproject.org/en/latest/userguide/tasks.html#bound-tasks.) 

506 """ 

507 from camcops_server.cc_modules.cc_request import command_line_request_context # delayed import # noqa 

508 from camcops_server.cc_modules.cc_session import CamcopsSession # delayed import # noqa 

509 from camcops_server.cc_modules.cc_user import ( 

510 SecurityAccountLockout, 

511 SecurityLoginFailure, 

512 ) # delayed import 

513 

514 log.debug("Housekeeping!") 

515 with command_line_request_context() as req: 

516 # --------------------------------------------------------------------- 

517 # Housekeeping tasks 

518 # --------------------------------------------------------------------- 

519 # We had a problem with MySQL locking here (two locks open for what 

520 # appeared to be a single delete, followed by a lock timeout). Seems to 

521 # be working now. 

522 CamcopsSession.delete_old_sessions(req) 

523 SecurityAccountLockout.delete_old_account_lockouts(req) 

524 SecurityLoginFailure.clear_dummy_login_failures_if_necessary(req) 

525 delete_old_user_downloads(req)