Coverage for cc_modules/cc_export.py: 31%

471 statements  

« prev     ^ index     » next       coverage.py v6.5.0, created at 2022-11-08 23:14 +0000

1#!/usr/bin/env python 

2 

3# noinspection HttpUrlsUsage 

4""" 

5camcops_server/cc_modules/cc_export.py 

6 

7=============================================================================== 

8 

9 Copyright (C) 2012, University of Cambridge, Department of Psychiatry. 

10 Created by Rudolf Cardinal (rnc1001@cam.ac.uk). 

11 

12 This file is part of CamCOPS. 

13 

14 CamCOPS is free software: you can redistribute it and/or modify 

15 it under the terms of the GNU General Public License as published by 

16 the Free Software Foundation, either version 3 of the License, or 

17 (at your option) any later version. 

18 

19 CamCOPS is distributed in the hope that it will be useful, 

20 but WITHOUT ANY WARRANTY; without even the implied warranty of 

21 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 

22 GNU General Public License for more details. 

23 

24 You should have received a copy of the GNU General Public License 

25 along with CamCOPS. If not, see <https://www.gnu.org/licenses/>. 

26 

27=============================================================================== 

28 

29.. _ActiveMQ: https://activemq.apache.org/ 

30.. _AMQP: https://www.amqp.org/ 

31.. _APScheduler: https://apscheduler.readthedocs.io/ 

32.. _Celery: https://www.celeryproject.org/ 

33.. _Dramatiq: https://dramatiq.io/ 

34.. _RabbitMQ: https://www.rabbitmq.com/ 

35.. _Redis: https://redis.io/ 

36.. _ZeroMQ: https://zeromq.org/ 

37 

38**Export and research dump functions.** 

39 

40Export design: 

41 

42*WHICH RECORDS TO SEND?* 

43 

44The most powerful mechanism is not to have a sending queue (which would then 

45require careful multi-instance locking), but to have a "sent" log. That way: 

46 

47- A record needs sending if it's not in the sent log (for an appropriate 

48 recipient). 

49- You can add a new recipient and the system will know about the (new) 

50 backlog automatically. 

51- You can specify criteria, e.g. don't upload records before 1/1/2014, and 

52 modify that later, and it would catch up with the backlog. 

53- Successes and failures are logged in the same table. 

54- Multiple recipients are handled with ease. 

55- No need to alter database.pl code that receives from tablets. 

56- Can run with a simple cron job. 

57 

58*LOCKING* 

59 

60- Don't use database locking: 

61 https://blog.engineyard.com/2011/5-subtle-ways-youre-using-mysql-as-a-queue-and-why-itll-bite-you 

62- Locking via UNIX lockfiles: 

63 

64 - https://pypi.python.org/pypi/lockfile 

65 - http://pythonhosted.org/lockfile/ (which also works on Windows) 

66 

67 - On UNIX, ``lockfile`` uses ``LinkLockFile``: 

68 https://github.com/smontanaro/pylockfile/blob/master/lockfile/linklockfile.py 

69 

70*MESSAGE QUEUE AND BACKEND* 

71 

72Thoughts as of 2018-12-22. 

73 

74- See https://www.fullstackpython.com/task-queues.html. Also http://queues.io/; 

75 https://stackoverflow.com/questions/731233/activemq-or-rabbitmq-or-zeromq-or. 

76 

77- The "default" is Celery_, with ``celery beat`` for scheduling, via an 

78 AMQP_ broker like RabbitMQ_. 

79 

80 - Downside: no longer supported under Windows as of Celery 4. 

81 

82 - There are immediate bugs when running the demo code with Celery 4.2.1, 

83 fixed by setting the environment variable ``set 

84 FORKED_BY_MULTIPROCESSING=1`` before running the worker; see 

85 https://github.com/celery/celery/issues/4178 and 

86 https://github.com/celery/celery/pull/4078. 

87 

88 - Downside: backend is complex; e.g. Erlang dependency of RabbitMQ. 

89 

90 - Celery also supports Redis_, but Redis_ doesn't support Windows directly 

91 (except the Windows Subsystem for Linux in Windows 10+). 

92 

93- Another possibility is Dramatiq_ with APScheduler_. 

94 

95 - Of note, APScheduler_ can use an SQLAlchemy database table as its job 

96 store, which might be good. 

97 - Dramatiq_ uses RabbitMQ_ or Redis_. 

98 - Dramatiq_ 1.4.0 (2018-11-25) installs cleanly under Windows. Use ``pip 

99 install --upgrade "dramatic[rabbitmq, watch]"`` (i.e. with double quotse, 

100 not the single quotes it suggests, which don't work under Windows). 

101 - However, the basic example (https://dramatiq.io/guide.html) fails under 

102 Windows; when you fire up ``dramatic count_words`` (even with ``--processes 

103 1 --threads 1``) it crashes with an error from ``ForkingPickler`` in 

104 ``multiprocessing.reduction``, i.e. 

105 https://docs.python.org/3/library/multiprocessing.html#windows. It also 

106 emits a ``PermissionError: [WinError 5] Access is denied``. This is 

107 discussed a bit at https://github.com/Bogdanp/dramatiq/issues/75; 

108 https://github.com/Bogdanp/dramatiq/blob/master/docs/source/changelog.rst. 

109 The changelog suggests 1.4.0 should work, but it doesn't. 

110 

111- Worth some thought about ZeroMQ_, which is a very different sort of thing. 

112 Very cross-platform. Needs work to guard against message loss (i.e. messages 

113 are unreliable by default). Dynamic "special socket" style. 

114 

115- Possibly also ActiveMQ_. 

116 

117- OK; so speed is not critical but we want message reliability, for it to work 

118 under Windows, and decent Python bindings with job scheduling. 

119 

120 - OUT: Redis (not Windows easily), ZeroMQ (fast but not by default reliable), 

121 ActiveMQ (few Python frameworks?). 

122 - REMAINING for message handling: RabbitMQ. 

123 - Python options therefore: Celery (but Windows not officially supported from 

124 4+); Dramatiq (but Windows also not very well supported and seems a bit 

125 bleeding-edge). 

126 

127- This is looking like a mess from the Windows perspective. 

128 

129- An alternative is just to use the database, of course. 

130 

131 - https://softwareengineering.stackexchange.com/questions/351449/message-queue-database-vs-dedicated-mq 

132 - http://mikehadlow.blogspot.com/2012/04/database-as-queue-anti-pattern.html 

133 - https://blog.jooq.org/2014/09/26/using-your-rdbms-for-messaging-is-totally-ok/ 

134 - https://stackoverflow.com/questions/13005410/why-do-we-need-message-brokers-like-rabbitmq-over-a-database-like-postgresql 

135 - https://www.quora.com/What-is-the-best-practice-using-db-tables-or-message-queues-for-moderation-of-content-approved-by-humans 

136 

137- Let's take a step back and summarize the problem. 

138 

139 - Many web threads may upload tasks. This should trigger a prompt export for 

140 all push recipients. 

141 - Whichever way we schedule a backend task job, it should be as the 

142 combination of recipient, basetable, task PK. (That way, if one recipient 

143 fails, the others can proceed independently.) 

144 - Every job should check that it's not been completed already (in case of 

145 accidental job restarts), i.e. is idempotent as far as we can make it. 

146 - How should this interact with the non-push recipients? 

147 - We should use the same locking method for push and non-push recipients. 

148 - We should make the locking granular and use file locks -- for example, for 

149 each task/recipient combination (or each whole-database export for a given 

150 recipient). 

151 

152""" # noqa 

153 

154from contextlib import ExitStack 

155import json 

156import logging 

157import os 

158import sqlite3 

159import tempfile 

160from typing import ( 

161 Dict, 

162 List, 

163 Generator, 

164 Optional, 

165 Set, 

166 Tuple, 

167 Type, 

168 TYPE_CHECKING, 

169 Union, 

170) 

171 

172from cardinal_pythonlib.classes import gen_all_subclasses 

173from cardinal_pythonlib.datetimefunc import ( 

174 format_datetime, 

175 get_now_localtz_pendulum, 

176 get_tz_local, 

177 get_tz_utc, 

178) 

179from cardinal_pythonlib.email.sendmail import CONTENT_TYPE_TEXT 

180from cardinal_pythonlib.fileops import relative_filename_within_dir 

181from cardinal_pythonlib.json.serialize import register_for_json 

182from cardinal_pythonlib.logs import BraceStyleAdapter 

183from cardinal_pythonlib.pyramid.responses import ( 

184 OdsResponse, 

185 SqliteBinaryResponse, 

186 TextAttachmentResponse, 

187 XlsxResponse, 

188 ZipResponse, 

189) 

190from cardinal_pythonlib.sizeformatter import bytes2human 

191from cardinal_pythonlib.sqlalchemy.session import get_safe_url_from_engine 

192import lockfile 

193from pendulum import DateTime as Pendulum, Duration, Period 

194from pyramid.httpexceptions import HTTPBadRequest 

195from pyramid.renderers import render_to_response 

196from pyramid.response import Response 

197from sqlalchemy.engine import create_engine 

198from sqlalchemy.engine.result import ResultProxy 

199from sqlalchemy.orm import Session as SqlASession, sessionmaker 

200from sqlalchemy.sql.expression import text 

201from sqlalchemy.sql.schema import Column, MetaData, Table 

202from sqlalchemy.sql.sqltypes import Text 

203 

204from camcops_server.cc_modules.cc_audit import audit 

205from camcops_server.cc_modules.cc_constants import DateFormat, JSON_INDENT 

206from camcops_server.cc_modules.cc_dataclasses import SummarySchemaInfo 

207from camcops_server.cc_modules.cc_db import ( 

208 REMOVE_COLUMNS_FOR_SIMPLIFIED_SPREADSHEETS, 

209) 

210from camcops_server.cc_modules.cc_dump import copy_tasks_and_summaries 

211from camcops_server.cc_modules.cc_email import Email 

212from camcops_server.cc_modules.cc_exception import FhirExportException 

213from camcops_server.cc_modules.cc_exportmodels import ( 

214 ExportedTask, 

215 ExportRecipient, 

216 gen_tasks_having_exportedtasks, 

217 get_collection_for_export, 

218) 

219from camcops_server.cc_modules.cc_forms import UserDownloadDeleteForm 

220from camcops_server.cc_modules.cc_pyramid import Routes, ViewArg, ViewParam 

221from camcops_server.cc_modules.cc_simpleobjects import TaskExportOptions 

222from camcops_server.cc_modules.cc_sqlalchemy import sql_from_sqlite_database 

223from camcops_server.cc_modules.cc_task import SNOMED_TABLENAME, Task 

224from camcops_server.cc_modules.cc_spreadsheet import ( 

225 SpreadsheetCollection, 

226 SpreadsheetPage, 

227) 

228from camcops_server.cc_modules.celery import ( 

229 create_user_download, 

230 email_basic_dump, 

231 export_task_backend, 

232 jittered_delay_s, 

233) 

234 

235if TYPE_CHECKING: 

236 from camcops_server.cc_modules.cc_request import CamcopsRequest 

237 from camcops_server.cc_modules.cc_taskcollection import TaskCollection 

238 

239log = BraceStyleAdapter(logging.getLogger(__name__)) 

240 

241 

242# ============================================================================= 

243# Constants 

244# ============================================================================= 

245 

246INFOSCHEMA_PAGENAME = "_camcops_information_schema_columns" 

247SUMMARYSCHEMA_PAGENAME = "_camcops_column_explanations" 

248REMOVE_TABLES_FOR_SIMPLIFIED_SPREADSHEETS = {SNOMED_TABLENAME} 

249EMPTY_SET = set() 

250 

251 

252# ============================================================================= 

253# Export tasks from the back end 

254# ============================================================================= 

255 

256 

257def print_export_queue( 

258 req: "CamcopsRequest", 

259 recipient_names: List[str] = None, 

260 all_recipients: bool = False, 

261 via_index: bool = True, 

262 pretty: bool = False, 

263 debug_show_fhir: bool = False, 

264 debug_fhir_include_docs: bool = False, 

265) -> None: 

266 """ 

267 Shows tasks that would be exported. 

268 

269 - Called from the command line. 

270 

271 Args: 

272 req: 

273 a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

274 recipient_names: 

275 list of export recipient names (as per the config file) 

276 all_recipients: 

277 use all recipients? 

278 via_index: 

279 use the task index (faster)? 

280 pretty: 

281 use ``str(task)`` not ``repr(task)`` (prettier, but slower because 

282 it has to query the patient) 

283 debug_show_fhir: 

284 Show FHIR output for each task, as JSON? 

285 debug_fhir_include_docs: 

286 (If debug_show_fhir.) Include document content? Large! 

287 """ 

288 recipients = req.get_export_recipients( 

289 recipient_names=recipient_names, 

290 all_recipients=all_recipients, 

291 save=False, 

292 ) 

293 if not recipients: 

294 log.warning("No export recipients") 

295 return 

296 for recipient in recipients: 

297 log.info("Tasks to be exported for recipient: {}", recipient) 

298 collection = get_collection_for_export( 

299 req, recipient, via_index=via_index 

300 ) 

301 for task in collection.gen_tasks_by_class(): 

302 print( 

303 f"{recipient.recipient_name}: " 

304 f"{str(task) if pretty else repr(task)}" 

305 ) 

306 if debug_show_fhir: 

307 try: 

308 bundle = task.get_fhir_bundle( 

309 req, 

310 recipient, 

311 skip_docs_if_other_content=not debug_fhir_include_docs, 

312 ) 

313 bundle_str = json.dumps( 

314 bundle.as_json(), indent=JSON_INDENT 

315 ) 

316 log.info("FHIR output as JSON:\n{}", bundle_str) 

317 except FhirExportException as e: 

318 log.info("Task has no non-document content:\n{}", e) 

319 

320 

321def export( 

322 req: "CamcopsRequest", 

323 recipient_names: List[str] = None, 

324 all_recipients: bool = False, 

325 via_index: bool = True, 

326 schedule_via_backend: bool = False, 

327) -> None: 

328 """ 

329 Exports all relevant tasks (pending incremental exports, or everything if 

330 applicable) for specified export recipients. 

331 

332 - Called from the command line, or from 

333 :func:`camcops_server.cc_modules.celery.export_to_recipient_backend`. 

334 - Calls :func:`export_whole_database` or :func:`export_tasks_individually`. 

335 

336 Args: 

337 req: a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

338 recipient_names: list of export recipient names (as per the config 

339 file) 

340 all_recipients: use all recipients? 

341 via_index: use the task index (faster)? 

342 schedule_via_backend: schedule jobs via the backend instead? 

343 """ 

344 recipients = req.get_export_recipients( 

345 recipient_names=recipient_names, all_recipients=all_recipients 

346 ) 

347 if not recipients: 

348 log.warning("No export recipients") 

349 return 

350 

351 for recipient in recipients: 

352 log.info("Exporting to recipient: {}", recipient.recipient_name) 

353 if recipient.using_db(): 

354 if schedule_via_backend: 

355 raise NotImplementedError( 

356 "Not yet implemented: whole-database export via Celery " 

357 "backend" 

358 ) # todo: implement whole-database export via Celery backend # noqa 

359 else: 

360 export_whole_database(req, recipient, via_index=via_index) 

361 else: 

362 # Non-database recipient. 

363 export_tasks_individually( 

364 req, 

365 recipient, 

366 via_index=via_index, 

367 schedule_via_backend=schedule_via_backend, 

368 ) 

369 log.info("Finished exporting to {}", recipient.recipient_name) 

370 

371 

372def export_whole_database( 

373 req: "CamcopsRequest", recipient: ExportRecipient, via_index: bool = True 

374) -> None: 

375 """ 

376 Exports to a database. 

377 

378 - Called by :func:`export`. 

379 - Holds a recipient-specific "database" file lock in the process. 

380 

381 Args: 

382 req: 

383 a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

384 recipient: 

385 an 

386 :class:`camcops_server.cc_modules.cc_exportmodels.ExportRecipient` 

387 via_index: 

388 use the task index (faster)? 

389 """ 

390 cfg = req.config 

391 lockfilename = cfg.get_export_lockfilename_recipient_db( 

392 recipient_name=recipient.recipient_name 

393 ) 

394 try: 

395 with lockfile.FileLock(lockfilename, timeout=0): # doesn't wait 

396 collection = get_collection_for_export( 

397 req, recipient, via_index=via_index 

398 ) 

399 dst_engine = create_engine( 

400 recipient.db_url, echo=recipient.db_echo 

401 ) 

402 log.info( 

403 "Exporting to database: {}", 

404 get_safe_url_from_engine(dst_engine), 

405 ) 

406 dst_session = sessionmaker(bind=dst_engine)() # type: SqlASession 

407 task_generator = gen_tasks_having_exportedtasks(collection) 

408 export_options = TaskExportOptions( 

409 include_blobs=recipient.db_include_blobs, 

410 db_patient_id_per_row=recipient.db_patient_id_per_row, 

411 db_make_all_tables_even_empty=True, 

412 db_include_summaries=recipient.db_add_summaries, 

413 ) 

414 copy_tasks_and_summaries( 

415 tasks=task_generator, 

416 dst_engine=dst_engine, 

417 dst_session=dst_session, 

418 export_options=export_options, 

419 req=req, 

420 ) 

421 dst_session.commit() 

422 except lockfile.AlreadyLocked: 

423 log.warning( 

424 "Export logfile {!r} already locked by another process; " 

425 "aborting (another process is doing this work)", 

426 lockfilename, 

427 ) 

428 # No need to retry by raising -- if someone else holds this lock, they 

429 # are doing the work that we wanted to do. 

430 

431 

432def export_tasks_individually( 

433 req: "CamcopsRequest", 

434 recipient: ExportRecipient, 

435 via_index: bool = True, 

436 schedule_via_backend: bool = False, 

437) -> None: 

438 """ 

439 Exports all necessary tasks for a recipient. 

440 

441 - Called by :func:`export`. 

442 - Calls :func:`export_task`, if ``schedule_via_backend`` is False. 

443 - Schedules :func:``camcops_server.cc_modules.celery.export_task_backend``, 

444 if ``schedule_via_backend`` is True, which calls :func:`export` in turn. 

445 

446 Args: 

447 req: 

448 a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

449 recipient: 

450 an 

451 :class:`camcops_server.cc_modules.cc_exportmodels.ExportRecipient` 

452 via_index: 

453 use the task index (faster)? 

454 schedule_via_backend: 

455 schedule jobs via the backend instead? 

456 """ 

457 collection = get_collection_for_export(req, recipient, via_index=via_index) 

458 n_tasks = 0 

459 recipient_name = recipient.recipient_name 

460 if schedule_via_backend: 

461 for task_or_index in collection.gen_all_tasks_or_indexes(): 

462 if isinstance(task_or_index, Task): 

463 basetable = task_or_index.tablename 

464 task_pk = task_or_index.pk 

465 else: 

466 basetable = task_or_index.task_table_name 

467 task_pk = task_or_index.task_pk 

468 log.info( 

469 "Scheduling job to export task {}.{} to {}", 

470 basetable, 

471 task_pk, 

472 recipient_name, 

473 ) 

474 export_task_backend.delay( 

475 recipient_name=recipient_name, 

476 basetable=basetable, 

477 task_pk=task_pk, 

478 ) 

479 n_tasks += 1 

480 log.info( 

481 f"Scheduled {n_tasks} background task exports to " 

482 f"{recipient_name}" 

483 ) 

484 else: 

485 for task in collection.gen_tasks_by_class(): 

486 # Do NOT use this to check the working of export_task_backend(): 

487 # export_task_backend(recipient.recipient_name, task.tablename, task.pk) # noqa 

488 # ... it will deadlock at the database (because we're already 

489 # within a query of some sort, I presume) 

490 export_task(req, recipient, task) 

491 n_tasks += 1 

492 log.info(f"Exported {n_tasks} tasks to {recipient_name}") 

493 

494 

495def export_task( 

496 req: "CamcopsRequest", recipient: ExportRecipient, task: Task 

497) -> None: 

498 """ 

499 Exports a single task, checking that it remains valid to do so. 

500 

501 - Called by :func:`export_tasks_individually` directly, or called via 

502 :func:``camcops_server.cc_modules.celery.export_task_backend`` if 

503 :func:`export_tasks_individually` requested that. 

504 - Calls 

505 :meth:`camcops_server.cc_modules.cc_exportmodels.ExportedTask.export`. 

506 - For FHIR, holds a recipient-specific "FHIR" file lock during export. 

507 - Always holds a recipient-and-task-specific file lock during export. 

508 

509 Args: 

510 req: 

511 a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

512 recipient: 

513 an 

514 :class:`camcops_server.cc_modules.cc_exportmodels.ExportRecipient` 

515 task: 

516 a :class:`camcops_server.cc_modules.cc_task.Task` 

517 """ 

518 

519 # Double-check it's OK! Just in case, for example, an old backend task has 

520 # persisted, or someone's managed to get an iffy back-end request in some 

521 # other way. 

522 if not recipient.is_task_suitable(task): 

523 # Warning will already have been emitted (by is_task_suitable). 

524 return 

525 

526 cfg = req.config 

527 lockfilename = cfg.get_export_lockfilename_recipient_task( 

528 recipient_name=recipient.recipient_name, 

529 basetable=task.tablename, 

530 pk=task.pk, 

531 ) 

532 dbsession = req.dbsession 

533 with ExitStack() as stack: 

534 

535 if recipient.using_fhir() and not recipient.fhir_concurrent: 

536 # Some FHIR servers struggle with parallel processing, so we hold 

537 # a lock to serialize them. See notes in cc_fhir.py. 

538 # 

539 # We always use the order (1) FHIR lockfile, (2) task lockfile, to 

540 # avoid a deadlock. 

541 # 

542 # (Note that it is impossible that a non-FHIR task export grabs the 

543 # second of these without the first, because the second lockfile is 

544 # recipient-specific and the recipient details include the fact 

545 # that it is a FHIR recipient.) 

546 fhir_lockfilename = cfg.get_export_lockfilename_recipient_fhir( 

547 recipient_name=recipient.recipient_name 

548 ) 

549 try: 

550 stack.enter_context( 

551 lockfile.FileLock( 

552 fhir_lockfilename, timeout=jittered_delay_s() 

553 ) 

554 # waits for a while 

555 ) 

556 except lockfile.AlreadyLocked: 

557 log.warning( 

558 "Export logfile {!r} already locked by another process; " 

559 "will try again later", 

560 fhir_lockfilename, 

561 ) 

562 raise 

563 # We will reschedule via Celery; see "self.retry(...)" in 

564 # celery.py 

565 

566 try: 

567 stack.enter_context( 

568 lockfile.FileLock(lockfilename, timeout=0) # doesn't wait 

569 ) 

570 # We recheck the export status once we hold the lock, in case 

571 # multiple jobs are competing to export it. 

572 if ExportedTask.task_already_exported( 

573 dbsession=dbsession, 

574 recipient_name=recipient.recipient_name, 

575 basetable=task.tablename, 

576 task_pk=task.pk, 

577 ): 

578 log.info( 

579 "Task {!r} already exported to recipient {}; " "ignoring", 

580 task, 

581 recipient, 

582 ) 

583 # Not a warning; it's normal to see these because it allows the 

584 # client API to skip some checks for speed. 

585 return 

586 # OK; safe to export now. 

587 et = ExportedTask(recipient, task) 

588 dbsession.add(et) 

589 et.export(req) 

590 dbsession.commit() # so the ExportedTask is visible to others ASAP 

591 except lockfile.AlreadyLocked: 

592 log.warning( 

593 "Export logfile {!r} already locked by another process; " 

594 "aborting (another process is doing this work)", 

595 lockfilename, 

596 ) 

597 

598 

599# ============================================================================= 

600# Helpers for task collection export functions 

601# ============================================================================= 

602 

603 

604def gen_audited_tasks_for_task_class( 

605 collection: "TaskCollection", 

606 cls: Type[Task], 

607 audit_descriptions: List[str], 

608) -> Generator[Task, None, None]: 

609 """ 

610 Generates tasks from a collection, for a given task class, simultaneously 

611 adding to an audit description. Used for user-triggered downloads. 

612 

613 Args: 

614 collection: 

615 a 

616 :class:`camcops_server.cc_modules.cc_taskcollection.TaskCollection` 

617 cls: 

618 the task class to generate 

619 audit_descriptions: 

620 list of strings to be modified 

621 

622 Yields: 

623 :class:`camcops_server.cc_modules.cc_task.Task` objects 

624 """ 

625 pklist = [] # type: List[int] 

626 for task in collection.tasks_for_task_class(cls): 

627 pklist.append(task.pk) 

628 yield task 

629 audit_descriptions.append( 

630 f"{cls.__tablename__}: " f"{','.join(str(pk) for pk in pklist)}" 

631 ) 

632 

633 

634def gen_audited_tasks_by_task_class( 

635 collection: "TaskCollection", audit_descriptions: List[str] 

636) -> Generator[Task, None, None]: 

637 """ 

638 Generates tasks from a collection, across task classes, simultaneously 

639 adding to an audit description. Used for user-triggered downloads. 

640 

641 Args: 

642 collection: a :class:`camcops_server.cc_modules.cc_taskcollection.TaskCollection` 

643 audit_descriptions: list of strings to be modified 

644 

645 Yields: 

646 :class:`camcops_server.cc_modules.cc_task.Task` objects 

647 """ # noqa 

648 for cls in collection.task_classes(): 

649 for task in gen_audited_tasks_for_task_class( 

650 collection, cls, audit_descriptions 

651 ): 

652 yield task 

653 

654 

655def get_information_schema_query(req: "CamcopsRequest") -> ResultProxy: 

656 """ 

657 Returns an SQLAlchemy query object that fetches the 

658 INFORMATION_SCHEMA.COLUMNS information from our source database. 

659 

660 This is not sensitive; there is no data, just structure/comments. 

661 """ 

662 # Find our database name 

663 # https://stackoverflow.com/questions/53554458/sqlalchemy-get-database-name-from-engine 

664 dbname = req.engine.url.database 

665 # Query the information schema for our database. 

666 # https://docs.sqlalchemy.org/en/13/core/sqlelement.html#sqlalchemy.sql.expression.text # noqa 

667 query = text( 

668 """ 

669 SELECT * 

670 FROM information_schema.columns 

671 WHERE table_schema = :dbname 

672 """ 

673 ).bindparams(dbname=dbname) 

674 result_proxy = req.dbsession.execute(query) 

675 return result_proxy 

676 

677 

678def get_information_schema_spreadsheet_page( 

679 req: "CamcopsRequest", page_name: str = INFOSCHEMA_PAGENAME 

680) -> SpreadsheetPage: 

681 """ 

682 Returns the server database's ``INFORMATION_SCHEMA.COLUMNS`` table as a 

683 :class:`camcops_server.cc_modules.cc_spreadsheet.SpreadsheetPage``. 

684 """ 

685 result_proxy = get_information_schema_query(req) 

686 return SpreadsheetPage.from_resultproxy(page_name, result_proxy) 

687 

688 

689def write_information_schema_to_dst( 

690 req: "CamcopsRequest", 

691 dst_session: SqlASession, 

692 dest_table_name: str = INFOSCHEMA_PAGENAME, 

693) -> None: 

694 """ 

695 Writes the server's information schema to a separate database session 

696 (which will be an SQLite database being created for download). 

697 

698 There must be no open transactions (i.e. please COMMIT before you call 

699 this function), since we need to create a table. 

700 """ 

701 # 1. Read the structure of INFORMATION_SCHEMA.COLUMNS itself. 

702 # https://stackoverflow.com/questions/21770829/sqlalchemy-copy-schema-and-data-of-subquery-to-another-database # noqa 

703 src_engine = req.engine 

704 dst_engine = dst_session.bind 

705 metadata = MetaData(bind=dst_engine) 

706 table = Table( 

707 "columns", # table name; see also "schema" argument 

708 metadata, # "load with the destination metadata" 

709 # Override some specific column types by hand, or they'll fail as 

710 # SQLAlchemy fails to reflect the MySQL LONGTEXT type properly: 

711 Column("COLUMN_DEFAULT", Text), 

712 Column("COLUMN_TYPE", Text), 

713 Column("GENERATION_EXPRESSION", Text), 

714 autoload=True, # "read (reflect) structure from the database" 

715 autoload_with=src_engine, # "read (reflect) structure from the source" 

716 schema="information_schema", # schema 

717 ) 

718 # 2. Write that structure to our new database. 

719 table.name = dest_table_name # create it with a different name 

720 table.schema = "" # we don't have a schema in the destination database 

721 table.create(dst_engine) # CREATE TABLE 

722 # 3. Fetch data. 

723 query = get_information_schema_query(req) 

724 # 4. Write the data. 

725 for row in query: 

726 dst_session.execute(table.insert(row)) 

727 # 5. COMMIT 

728 dst_session.commit() 

729 

730 

731# ============================================================================= 

732# Convert task collections to different export formats for user download 

733# ============================================================================= 

734 

735 

736@register_for_json 

737class DownloadOptions(object): 

738 """ 

739 Represents options for the process of the user downloading tasks. 

740 """ 

741 

742 DELIVERY_MODES = [ViewArg.DOWNLOAD, ViewArg.EMAIL, ViewArg.IMMEDIATELY] 

743 

744 def __init__( 

745 self, 

746 user_id: int, 

747 viewtype: str, 

748 delivery_mode: str, 

749 spreadsheet_simplified: bool = False, 

750 spreadsheet_sort_by_heading: bool = False, 

751 db_include_blobs: bool = False, 

752 db_patient_id_per_row: bool = False, 

753 include_information_schema_columns: bool = True, 

754 include_summary_schema: bool = True, 

755 ) -> None: 

756 """ 

757 Args: 

758 user_id: 

759 ID of the user creating the request (may be needed to pass to 

760 the back-end) 

761 viewtype: 

762 file format for receiving data (e.g. XLSX, SQLite) 

763 delivery_mode: 

764 method of delivery (e.g. immediate, e-mail) 

765 spreadsheet_sort_by_heading: 

766 (For spreadsheets.) 

767 Sort columns within each page by heading name? 

768 db_include_blobs: 

769 (For database downloads.) 

770 Include BLOBs? 

771 db_patient_id_per_row: 

772 (For database downloads.) 

773 Denormalize by include the patient ID in all rows of 

774 patient-related tables? 

775 include_information_schema_columns: 

776 Include descriptions of the database source columns? 

777 include_summary_schema: 

778 Include descriptions of summary columns and other columns in 

779 output spreadsheets? 

780 """ 

781 assert delivery_mode in self.DELIVERY_MODES 

782 self.user_id = user_id 

783 self.viewtype = viewtype 

784 self.delivery_mode = delivery_mode 

785 self.spreadsheet_simplified = spreadsheet_simplified 

786 self.spreadsheet_sort_by_heading = spreadsheet_sort_by_heading 

787 self.db_include_blobs = db_include_blobs 

788 self.db_patient_id_per_row = db_patient_id_per_row 

789 self.include_information_schema_columns = ( 

790 include_information_schema_columns 

791 ) 

792 self.include_summary_schema = include_summary_schema 

793 

794 

795class TaskCollectionExporter(object): 

796 """ 

797 Class to provide tasks for user download. 

798 """ 

799 

800 def __init__( 

801 self, 

802 req: "CamcopsRequest", 

803 collection: "TaskCollection", 

804 options: DownloadOptions, 

805 ): 

806 """ 

807 Args: 

808 req: 

809 a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

810 collection: 

811 a :class:`camcops_server.cc_modules.cc_taskcollection.TaskCollection` 

812 options: 

813 :class:`DownloadOptions` governing the download 

814 """ # noqa 

815 self.req = req 

816 self.collection = collection 

817 self.options = options 

818 

819 @property 

820 def viewtype(self) -> str: 

821 raise NotImplementedError("Exporter needs to implement 'viewtype'") 

822 

823 @property 

824 def file_extension(self) -> str: 

825 raise NotImplementedError( 

826 "Exporter needs to implement 'file_extension'" 

827 ) 

828 

829 def get_filename(self) -> str: 

830 """ 

831 Returns the filename for the download. 

832 """ 

833 timestamp = format_datetime(self.req.now, DateFormat.FILENAME) 

834 return f"CamCOPS_dump_{timestamp}.{self.file_extension}" 

835 

836 def immediate_response(self, req: "CamcopsRequest") -> Response: 

837 """ 

838 Returns either a :class:`Response` with the data, or a 

839 :class:`Response` saying how the user will obtain their data later. 

840 

841 Args: 

842 req: a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

843 """ 

844 if self.options.delivery_mode == ViewArg.EMAIL: 

845 self.schedule_email() 

846 return render_to_response( 

847 "email_scheduled.mako", dict(), request=req 

848 ) 

849 elif self.options.delivery_mode == ViewArg.DOWNLOAD: 

850 self.schedule_download() 

851 return render_to_response( 

852 "download_scheduled.mako", dict(), request=req 

853 ) 

854 else: # ViewArg.IMMEDIATELY 

855 return self.download_now() 

856 

857 def download_now(self) -> Response: 

858 """ 

859 Download the data dump in the selected format 

860 """ 

861 filename, body = self.to_file() 

862 return self.get_data_response(body=body, filename=filename) 

863 

864 def schedule_email(self) -> None: 

865 """ 

866 Schedule the export asynchronously and e-mail the logged in user 

867 when done 

868 """ 

869 email_basic_dump.delay(self.collection, self.options) 

870 

871 def send_by_email(self) -> None: 

872 """ 

873 Send the data dump by e-mail to the logged in user 

874 """ 

875 _ = self.req.gettext 

876 config = self.req.config 

877 

878 filename, body = self.to_file() 

879 email_to = self.req.user.email 

880 email = Email( 

881 # date: automatic 

882 from_addr=config.email_from, 

883 to=email_to, 

884 subject=_("CamCOPS research data dump"), 

885 body=_("The research data dump you requested is attached."), 

886 content_type=CONTENT_TYPE_TEXT, 

887 charset="utf8", 

888 attachments_binary=[(filename, body)], 

889 ) 

890 email.send( 

891 host=config.email_host, 

892 username=config.email_host_username, 

893 password=config.email_host_password, 

894 port=config.email_port, 

895 use_tls=config.email_use_tls, 

896 ) 

897 

898 if email.sent: 

899 log.info(f"Research dump emailed to {email_to}") 

900 else: 

901 log.error(f"Failed to email research dump to {email_to}") 

902 

903 def schedule_download(self) -> None: 

904 """ 

905 Schedule a background export to a file that the user can download 

906 later. 

907 """ 

908 create_user_download.delay(self.collection, self.options) 

909 

910 def create_user_download_and_email(self) -> None: 

911 """ 

912 Creates a user download, and e-mails the user to let them know. 

913 """ 

914 _ = self.req.gettext 

915 config = self.req.config 

916 

917 download_dir = self.req.user_download_dir 

918 space = self.req.user_download_bytes_available 

919 filename, contents = self.to_file() 

920 size = len(contents) 

921 

922 if size > space: 

923 # Not enough space 

924 total_permitted = self.req.user_download_bytes_permitted 

925 msg = _( 

926 "You do not have enough space to create this download. " 

927 "You are allowed {total_permitted} bytes and you are have " 

928 "{space} bytes free. This download would need {size} bytes." 

929 ).format(total_permitted=total_permitted, space=space, size=size) 

930 else: 

931 # Create file 

932 fullpath = os.path.join(download_dir, filename) 

933 try: 

934 with open(fullpath, "wb") as f: 

935 f.write(contents) 

936 # Success 

937 log.info(f"Created user download: {fullpath}") 

938 msg = ( 

939 _( 

940 "The research data dump you requested is ready to be " 

941 "downloaded. You will find it in your download area. " 

942 "It is called %s" 

943 ) 

944 % filename 

945 ) 

946 except Exception as e: 

947 # Some other error 

948 msg = _( 

949 "Failed to create file {filename}. Error was: {message}" 

950 ).format(filename=filename, message=e) 

951 

952 # E-mail the user, if they have an e-mail address 

953 email_to = self.req.user.email 

954 if email_to: 

955 email = Email( 

956 # date: automatic 

957 from_addr=config.email_from, 

958 to=email_to, 

959 subject=_("CamCOPS research data dump"), 

960 body=msg, 

961 content_type=CONTENT_TYPE_TEXT, 

962 charset="utf8", 

963 ) 

964 email.send( 

965 host=config.email_host, 

966 username=config.email_host_username, 

967 password=config.email_host_password, 

968 port=config.email_port, 

969 use_tls=config.email_use_tls, 

970 ) 

971 

972 def get_data_response(self, body: bytes, filename: str) -> Response: 

973 raise NotImplementedError("Exporter needs to implement 'get_response'") 

974 

975 def to_file(self) -> Tuple[str, bytes]: 

976 """ 

977 Returns the tuple ``filename, file_contents``. 

978 """ 

979 return self.get_filename(), self.get_file_body() 

980 

981 def get_file_body(self) -> bytes: 

982 """ 

983 Returns binary data to be stored as a file. 

984 """ 

985 raise NotImplementedError( 

986 "Exporter needs to implement 'get_file_body'" 

987 ) 

988 

989 def get_spreadsheet_collection(self) -> SpreadsheetCollection: 

990 """ 

991 Converts the collection of tasks to a collection of spreadsheet-style 

992 data. Also audits the request as a basic data dump. 

993 

994 Returns: 

995 a 

996 :class:`camcops_server.cc_modules.cc_spreadsheet.SpreadsheetCollection` 

997 object 

998 """ # noqa 

999 audit_descriptions = [] # type: List[str] 

1000 options = self.options 

1001 if options.spreadsheet_simplified: 

1002 summary_exclusion_tables = ( 

1003 REMOVE_TABLES_FOR_SIMPLIFIED_SPREADSHEETS 

1004 ) 

1005 summary_exclusion_columns = ( 

1006 REMOVE_COLUMNS_FOR_SIMPLIFIED_SPREADSHEETS 

1007 ) 

1008 else: 

1009 summary_exclusion_tables = EMPTY_SET 

1010 summary_exclusion_columns = EMPTY_SET 

1011 # Task may return >1 sheet for output (e.g. for subtables). 

1012 coll = SpreadsheetCollection() 

1013 

1014 # Iterate through tasks, creating the spreadsheet collection 

1015 schema_elements = set() # type: Set[SummarySchemaInfo] 

1016 for cls in self.collection.task_classes(): 

1017 schema_done = False 

1018 for task in gen_audited_tasks_for_task_class( 

1019 self.collection, cls, audit_descriptions 

1020 ): 

1021 # Task data 

1022 coll.add_pages(task.get_spreadsheet_pages(self.req)) 

1023 if not schema_done and options.include_summary_schema: 

1024 # Schema (including summary explanations) 

1025 schema_elements |= task.get_spreadsheet_schema_elements( 

1026 self.req 

1027 ) 

1028 # We just need this from one task instance. 

1029 schema_done = True 

1030 

1031 if options.include_summary_schema: 

1032 coll.add_page( 

1033 SpreadsheetPage( 

1034 name=SUMMARYSCHEMA_PAGENAME, 

1035 rows=[ 

1036 si.as_dict 

1037 for si in sorted(schema_elements) 

1038 if si.column_name not in summary_exclusion_columns 

1039 and si.table_name not in summary_exclusion_tables 

1040 ], 

1041 ) 

1042 ) 

1043 

1044 if options.include_information_schema_columns: 

1045 # Source database information schema 

1046 coll.add_page(get_information_schema_spreadsheet_page(self.req)) 

1047 

1048 # Simplify 

1049 if options.spreadsheet_simplified: 

1050 coll.delete_pages(summary_exclusion_tables) 

1051 coll.delete_columns(summary_exclusion_columns) 

1052 

1053 # Sort 

1054 coll.sort_pages() 

1055 if options.spreadsheet_sort_by_heading: 

1056 coll.sort_headings_within_all_pages() 

1057 

1058 # Audit 

1059 audit(self.req, f"Basic dump: {'; '.join(audit_descriptions)}") 

1060 

1061 return coll 

1062 

1063 

1064class OdsExporter(TaskCollectionExporter): 

1065 """ 

1066 Converts a set of tasks to an OpenOffice ODS file. 

1067 """ 

1068 

1069 file_extension = "ods" 

1070 viewtype = ViewArg.ODS 

1071 

1072 def get_file_body(self) -> bytes: 

1073 return self.get_spreadsheet_collection().as_ods() 

1074 

1075 def get_data_response(self, body: bytes, filename: str) -> Response: 

1076 return OdsResponse(body=body, filename=filename) 

1077 

1078 

1079class RExporter(TaskCollectionExporter): 

1080 """ 

1081 Converts a set of tasks to an R script. 

1082 """ 

1083 

1084 file_extension = "R" 

1085 viewtype = ViewArg.R 

1086 

1087 def __init__(self, *args, **kwargs) -> None: 

1088 super().__init__(*args, **kwargs) 

1089 self.encoding = "utf-8" 

1090 

1091 def get_file_body(self) -> bytes: 

1092 return self.get_r_script().encode(self.encoding) 

1093 

1094 def get_r_script(self) -> str: 

1095 return self.get_spreadsheet_collection().as_r() 

1096 

1097 def get_data_response(self, body: bytes, filename: str) -> Response: 

1098 filename = self.get_filename() 

1099 r_script = self.get_r_script() 

1100 return TextAttachmentResponse(body=r_script, filename=filename) 

1101 

1102 

1103class TsvZipExporter(TaskCollectionExporter): 

1104 """ 

1105 Converts a set of tasks to a set of TSV (tab-separated value) file, (one 

1106 per table) in a ZIP file. 

1107 """ 

1108 

1109 file_extension = "zip" 

1110 viewtype = ViewArg.TSV_ZIP 

1111 

1112 def get_file_body(self) -> bytes: 

1113 return self.get_spreadsheet_collection().as_zip() 

1114 

1115 def get_data_response(self, body: bytes, filename: str) -> Response: 

1116 return ZipResponse(body=body, filename=filename) 

1117 

1118 

1119class XlsxExporter(TaskCollectionExporter): 

1120 """ 

1121 Converts a set of tasks to an Excel XLSX file. 

1122 """ 

1123 

1124 file_extension = "xlsx" 

1125 viewtype = ViewArg.XLSX 

1126 

1127 def get_file_body(self) -> bytes: 

1128 return self.get_spreadsheet_collection().as_xlsx() 

1129 

1130 def get_data_response(self, body: bytes, filename: str) -> Response: 

1131 return XlsxResponse(body=body, filename=filename) 

1132 

1133 

1134class SqliteExporter(TaskCollectionExporter): 

1135 """ 

1136 Converts a set of tasks to an SQLite binary file. 

1137 """ 

1138 

1139 file_extension = "sqlite" 

1140 viewtype = ViewArg.SQLITE 

1141 

1142 def get_export_options(self) -> TaskExportOptions: 

1143 return TaskExportOptions( 

1144 include_blobs=self.options.db_include_blobs, 

1145 db_include_summaries=True, 

1146 db_make_all_tables_even_empty=True, # debatable, but more consistent! # noqa 

1147 db_patient_id_per_row=self.options.db_patient_id_per_row, 

1148 ) 

1149 

1150 def get_sqlite_data(self, as_text: bool) -> Union[bytes, str]: 

1151 """ 

1152 Returns data as a binary SQLite database, or SQL text to create it. 

1153 

1154 Args: 

1155 as_text: textual SQL, rather than binary SQLite? 

1156 

1157 Returns: 

1158 ``bytes`` or ``str``, according to ``as_text`` 

1159 """ 

1160 # --------------------------------------------------------------------- 

1161 # Create memory file, dumper, and engine 

1162 # --------------------------------------------------------------------- 

1163 

1164 # This approach failed: 

1165 # 

1166 # memfile = io.StringIO() 

1167 # 

1168 # def dump(querysql, *multiparams, **params): 

1169 # compsql = querysql.compile(dialect=engine.dialect) 

1170 # memfile.write("{};\n".format(compsql)) 

1171 # 

1172 # engine = create_engine('{dialect}://'.format(dialect=dialect_name), 

1173 # strategy='mock', executor=dump) 

1174 # dst_session = sessionmaker(bind=engine)() # type: SqlASession 

1175 # 

1176 # ... you get the error 

1177 # AttributeError: 'MockConnection' object has no attribute 'begin' 

1178 # ... which is fair enough. 

1179 # 

1180 # Next best thing: SQLite database. 

1181 # Two ways to deal with it: 

1182 # (a) duplicate our C++ dump code (which itself duplicate the SQLite 

1183 # command-line executable's dump facility), then create the 

1184 # database, dump it to a string, serve the string; or 

1185 # (b) offer the binary SQLite file. 

1186 # Or... (c) both. 

1187 # Aha! pymysqlite.iterdump does this for us. 

1188 # 

1189 # If we create an in-memory database using create_engine('sqlite://'), 

1190 # can we get the binary contents out? Don't think so. 

1191 # 

1192 # So we should first create a temporary on-disk file, then use that. 

1193 

1194 # --------------------------------------------------------------------- 

1195 # Make temporary file (one whose filename we can know). 

1196 # --------------------------------------------------------------------- 

1197 # We use tempfile.mkstemp() for security, or NamedTemporaryFile, 

1198 # which is a bit easier. However, you can't necessarily open the file 

1199 # again under all OSs, so that's no good. The final option is 

1200 # TemporaryDirectory, which is secure and convenient. 

1201 # 

1202 # https://docs.python.org/3/library/tempfile.html 

1203 # https://security.openstack.org/guidelines/dg_using-temporary-files-securely.html # noqa 

1204 # https://stackoverflow.com/questions/3924117/how-to-use-tempfile-namedtemporaryfile-in-python # noqa 

1205 db_basename = "temp.sqlite3" 

1206 with tempfile.TemporaryDirectory() as tmpdirname: 

1207 db_filename = os.path.join(tmpdirname, db_basename) 

1208 # --------------------------------------------------------------------- 

1209 # Make SQLAlchemy session 

1210 # --------------------------------------------------------------------- 

1211 url = "sqlite:///" + db_filename 

1212 engine = create_engine(url, echo=False) 

1213 dst_session = sessionmaker(bind=engine)() # type: SqlASession 

1214 # --------------------------------------------------------------------- 

1215 # Iterate through tasks, creating tables as we need them. 

1216 # --------------------------------------------------------------------- 

1217 audit_descriptions = [] # type: List[str] 

1218 task_generator = gen_audited_tasks_by_task_class( 

1219 self.collection, audit_descriptions 

1220 ) 

1221 # --------------------------------------------------------------------- 

1222 # Next bit very tricky. We're trying to achieve several things: 

1223 # - a copy of part of the database structure 

1224 # - a copy of part of the data, with relationships intact 

1225 # - nothing sensitive (e.g. full User records) going through 

1226 # - adding new columns for Task objects offering summary values 

1227 # - Must treat tasks all together, because otherwise we will insert 

1228 # duplicate dependency objects like Group objects. 

1229 # --------------------------------------------------------------------- 

1230 copy_tasks_and_summaries( 

1231 tasks=task_generator, 

1232 dst_engine=engine, 

1233 dst_session=dst_session, 

1234 export_options=self.get_export_options(), 

1235 req=self.req, 

1236 ) 

1237 dst_session.commit() 

1238 if self.options.include_information_schema_columns: 

1239 # Must have committed before we do this: 

1240 write_information_schema_to_dst(self.req, dst_session) 

1241 # --------------------------------------------------------------------- 

1242 # Audit 

1243 # --------------------------------------------------------------------- 

1244 audit(self.req, f"SQL dump: {'; '.join(audit_descriptions)}") 

1245 # --------------------------------------------------------------------- 

1246 # Fetch file contents, either as binary, or as SQL 

1247 # --------------------------------------------------------------------- 

1248 if as_text: 

1249 # SQL text 

1250 connection = sqlite3.connect( 

1251 db_filename 

1252 ) # type: sqlite3.Connection # noqa 

1253 sql_text = sql_from_sqlite_database(connection) 

1254 connection.close() 

1255 return sql_text 

1256 else: 

1257 # SQLite binary 

1258 with open(db_filename, "rb") as f: 

1259 binary_contents = f.read() 

1260 return binary_contents 

1261 

1262 def get_file_body(self) -> bytes: 

1263 return self.get_sqlite_data(as_text=False) 

1264 

1265 def get_data_response(self, body: bytes, filename: str) -> Response: 

1266 return SqliteBinaryResponse(body=body, filename=filename) 

1267 

1268 

1269class SqlExporter(SqliteExporter): 

1270 """ 

1271 Converts a set of tasks to the textual SQL needed to create an SQLite file. 

1272 """ 

1273 

1274 file_extension = "sql" 

1275 viewtype = ViewArg.SQL 

1276 

1277 def __init__(self, *args, **kwargs) -> None: 

1278 super().__init__(*args, **kwargs) 

1279 self.encoding = "utf-8" 

1280 

1281 def get_file_body(self) -> bytes: 

1282 return self.get_sql().encode(self.encoding) 

1283 

1284 def get_sql(self) -> str: 

1285 """ 

1286 Returns SQL text representing the SQLite database. 

1287 """ 

1288 return self.get_sqlite_data(as_text=True) 

1289 

1290 def download_now(self) -> Response: 

1291 """ 

1292 Download the data dump in the selected format 

1293 """ 

1294 filename = self.get_filename() 

1295 sql_text = self.get_sql() 

1296 return TextAttachmentResponse(body=sql_text, filename=filename) 

1297 

1298 def get_data_response(self, body: bytes, filename: str) -> Response: 

1299 """ 

1300 Unused. 

1301 """ 

1302 pass 

1303 

1304 

1305# Create mapping from "viewtype" to class. 

1306# noinspection PyTypeChecker 

1307DOWNLOADER_CLASSES = {} # type: Dict[str, Type[TaskCollectionExporter]] 

1308for _cls in gen_all_subclasses( 

1309 TaskCollectionExporter 

1310): # type: Type[TaskCollectionExporter] # noqa 

1311 # noinspection PyTypeChecker 

1312 DOWNLOADER_CLASSES[_cls.viewtype] = _cls 

1313 

1314 

1315def make_exporter( 

1316 req: "CamcopsRequest", 

1317 collection: "TaskCollection", 

1318 options: DownloadOptions, 

1319) -> TaskCollectionExporter: 

1320 """ 

1321 

1322 Args: 

1323 req: 

1324 a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

1325 collection: 

1326 a 

1327 :class:`camcops_server.cc_modules.cc_taskcollection.TaskCollection` 

1328 options: 

1329 :class:`camcops_server.cc_modules.cc_export.DownloadOptions` 

1330 governing the download 

1331 

1332 Returns: 

1333 a :class:`BasicTaskCollectionExporter` 

1334 

1335 Raises: 

1336 :exc:`HTTPBadRequest` if the arguments are bad 

1337 """ 

1338 _ = req.gettext 

1339 if options.delivery_mode not in DownloadOptions.DELIVERY_MODES: 

1340 raise HTTPBadRequest( 

1341 f"{_('Bad delivery mode:')} {options.delivery_mode!r} " 

1342 f"({_('permissible:')} " 

1343 f"{DownloadOptions.DELIVERY_MODES!r})" 

1344 ) 

1345 try: 

1346 downloader_class = DOWNLOADER_CLASSES[options.viewtype] 

1347 except KeyError: 

1348 raise HTTPBadRequest( 

1349 f"{_('Bad output type:')} {options.viewtype!r} " 

1350 f"({_('permissible:')} {DOWNLOADER_CLASSES.keys()!r})" 

1351 ) 

1352 return downloader_class(req=req, collection=collection, options=options) 

1353 

1354 

1355# ============================================================================= 

1356# Represent files for users to download 

1357# ============================================================================= 

1358 

1359 

1360class UserDownloadFile(object): 

1361 """ 

1362 Represents a file that has been generated for the user to download. 

1363 

1364 Test code: 

1365 

1366 .. code-block:: python 

1367 

1368 from camcops_server.cc_modules.cc_export import UserDownloadFile 

1369 x = UserDownloadFile("/etc/hosts") 

1370 

1371 print(x.when_last_modified) # should match output of: ls -l /etc/hosts 

1372 

1373 many = UserDownloadFile.from_directory_scan("/etc") 

1374 

1375 """ 

1376 

1377 def __init__( 

1378 self, 

1379 filename: str, 

1380 directory: str = "", 

1381 permitted_lifespan_min: float = 0, 

1382 req: "CamcopsRequest" = None, 

1383 ) -> None: 

1384 """ 

1385 Args: 

1386 filename: 

1387 Filename, either absolute, or if ``directory`` is specified, 

1388 relative to ``directory``. 

1389 directory: 

1390 Directory. If specified, ``filename`` must be within it. 

1391 req: 

1392 a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

1393 

1394 Notes: 

1395 

1396 - The Unix ``ls`` command shows timestamps in the current timezone. 

1397 Try ``TZ=utc ls -l <filename>`` or ``TZ="America/New_York" ls -l 

1398 <filename>`` to see this. 

1399 - The underlying timestamp is the time (in seconds) since the Unix 

1400 "epoch", which is 00:00:00 UTC on 1 Jan 1970 

1401 (https://en.wikipedia.org/wiki/Unix_time). 

1402 """ 

1403 self.filename = filename 

1404 self.permitted_lifespan_min = permitted_lifespan_min 

1405 self.req = req 

1406 

1407 self.basename = os.path.basename(filename) 

1408 _, self.extension = os.path.splitext(filename) 

1409 if directory: 

1410 # filename must be within the directory specified 

1411 self.directory = os.path.abspath(directory) 

1412 candidate_path = os.path.abspath( 

1413 os.path.join(self.directory, filename) 

1414 ) 

1415 if os.path.commonpath([directory, candidate_path]) != directory: 

1416 # Filename is not within directory. 

1417 # This is dodgy -- someone may have passed a filename like 

1418 # "../../dangerous_dir/unsafe_content.txt" 

1419 self.fullpath = "" 

1420 # ... ensures that "exists" will be False. 

1421 else: 

1422 self.fullpath = candidate_path 

1423 else: 

1424 # filename is treated as an absolute path 

1425 self.directory = "" 

1426 self.fullpath = filename 

1427 

1428 try: 

1429 self.statinfo = os.stat(self.fullpath) 

1430 self.exists = True 

1431 except FileNotFoundError: 

1432 self.statinfo = None # type: Optional[os.stat_result] 

1433 self.exists = False 

1434 

1435 # ------------------------------------------------------------------------- 

1436 # Size 

1437 # ------------------------------------------------------------------------- 

1438 

1439 @property 

1440 def size(self) -> Optional[int]: 

1441 """ 

1442 Size of the file, in bytes. Returns ``None`` if the file does not 

1443 exist. 

1444 """ 

1445 return self.statinfo.st_size if self.exists else None 

1446 

1447 @property 

1448 def size_str(self) -> str: 

1449 """ 

1450 Returns a pretty-format string describing the file's size. 

1451 """ 

1452 size_bytes = self.size 

1453 if size_bytes is None: 

1454 return "" 

1455 return bytes2human(size_bytes) 

1456 

1457 # ------------------------------------------------------------------------- 

1458 # Timing 

1459 # ------------------------------------------------------------------------- 

1460 

1461 @property 

1462 def when_last_modified(self) -> Optional[Pendulum]: 

1463 """ 

1464 Returns the file's modification time, or ``None`` if it doesn't exist. 

1465 

1466 (Creation time is harder! See 

1467 https://stackoverflow.com/questions/237079/how-to-get-file-creation-modification-date-times-in-python.) 

1468 """ # noqa 

1469 if not self.exists: 

1470 return None 

1471 # noinspection PyTypeChecker 

1472 creation = Pendulum.fromtimestamp( 

1473 self.statinfo.st_mtime, tz=get_tz_utc() 

1474 ) # type: Pendulum 

1475 # ... gives the correct time in the UTC timezone 

1476 # ... note that utcfromtimestamp() gives a time without a timezone, 

1477 # which is unhelpful! 

1478 # We would like this to display in the current timezone: 

1479 return creation.in_timezone(get_tz_local()) 

1480 

1481 @property 

1482 def when_last_modified_str(self) -> str: 

1483 """ 

1484 Returns a formatted string with the file's modification time. 

1485 """ 

1486 w = self.when_last_modified 

1487 if not w: 

1488 return "" 

1489 return format_datetime(w, DateFormat.ISO8601_HUMANIZED_TO_SECONDS) 

1490 

1491 @property 

1492 def time_left(self) -> Optional[Duration]: 

1493 """ 

1494 Returns the amount of time that this file has left to live before 

1495 the server will delete it. Returns ``None`` if the file does not exist. 

1496 """ 

1497 if not self.exists: 

1498 return None 

1499 now = get_now_localtz_pendulum() 

1500 death = self.when_last_modified + Duration( 

1501 minutes=self.permitted_lifespan_min 

1502 ) 

1503 remaining = death - now # type: Period 

1504 # Note that Period is a subclass of Duration, but its __str__() 

1505 # method is different. Duration maps __str__() to in_words(), but 

1506 # Period maps __str__() to __repr__(). 

1507 return remaining 

1508 

1509 @property 

1510 def time_left_str(self) -> str: 

1511 """ 

1512 A string version of :meth:`time_left`. 

1513 """ 

1514 t = self.time_left 

1515 if not t: 

1516 return "" 

1517 return t.in_words() # Duration and Period do nice formatting 

1518 

1519 def older_than(self, when: Pendulum) -> bool: 

1520 """ 

1521 Was the file created before the specified time? 

1522 """ 

1523 m = self.when_last_modified 

1524 if not m: 

1525 return False 

1526 return m < when 

1527 

1528 # ------------------------------------------------------------------------- 

1529 # Deletion 

1530 # ------------------------------------------------------------------------- 

1531 

1532 @property 

1533 def delete_form(self) -> str: 

1534 """ 

1535 Returns HTML for a form to delete this file. 

1536 """ 

1537 if not self.req: 

1538 return "" 

1539 dest_url = self.req.route_url(Routes.DELETE_FILE) 

1540 form = UserDownloadDeleteForm(request=self.req, action=dest_url) 

1541 appstruct = {ViewParam.FILENAME: self.filename} 

1542 rendered_form = form.render(appstruct) 

1543 return rendered_form 

1544 

1545 def delete(self) -> None: 

1546 """ 

1547 Deletes the file. Does not raise an exception if the file does not 

1548 exist. 

1549 """ 

1550 try: 

1551 os.remove(self.fullpath) 

1552 log.info(f"Deleted file: {self.fullpath}") 

1553 except OSError: 

1554 pass 

1555 

1556 # ------------------------------------------------------------------------- 

1557 # Downloading 

1558 # ------------------------------------------------------------------------- 

1559 

1560 @property 

1561 def download_url(self) -> str: 

1562 """ 

1563 Returns a URL to download this file. 

1564 """ 

1565 if not self.req: 

1566 return "" 

1567 querydict = {ViewParam.FILENAME: self.filename} 

1568 return self.req.route_url(Routes.DOWNLOAD_FILE, _query=querydict) 

1569 

1570 @property 

1571 def contents(self) -> Optional[bytes]: 

1572 """ 

1573 The file contents. May raise :exc:`OSError` if the read fails. 

1574 """ 

1575 if not self.exists: 

1576 return None 

1577 with open(self.fullpath, "rb") as f: 

1578 return f.read() 

1579 

1580 # ------------------------------------------------------------------------- 

1581 # Bulk creation 

1582 # ------------------------------------------------------------------------- 

1583 

1584 @classmethod 

1585 def from_directory_scan( 

1586 cls, 

1587 directory: str, 

1588 permitted_lifespan_min: float = 0, 

1589 req: "CamcopsRequest" = None, 

1590 ) -> List["UserDownloadFile"]: 

1591 """ 

1592 Scans the directory and returns a list of :class:`UserDownloadFile` 

1593 objects, one for each file in the directory. 

1594 

1595 For each object, ``directory`` is the root directory (our parameter 

1596 here), and ``filename`` is the filename RELATIVE to that. 

1597 

1598 Args: 

1599 directory: directory to scan 

1600 permitted_lifespan_min: lifespan for each file 

1601 req: a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

1602 """ 

1603 results = [] # type: List[UserDownloadFile] 

1604 # Imagine directory == "/etc": 

1605 for root, dirs, files in os.walk(directory): 

1606 # ... then root might at times be "/etc/apache2" 

1607 for f in files: 

1608 fullpath = os.path.join(root, f) 

1609 relative_filename = relative_filename_within_dir( 

1610 fullpath, directory 

1611 ) 

1612 results.append( 

1613 UserDownloadFile( 

1614 filename=relative_filename, 

1615 directory=directory, 

1616 permitted_lifespan_min=permitted_lifespan_min, 

1617 req=req, 

1618 ) 

1619 ) 

1620 return results