Table Of Contents

Previous topic

Pydoop Documentation

Next topic

Tutorial

Get Pydoop

Contributors

Pydoop is developed by: CRS4

And generously hosted by: Get Pydoop at SourceForge.net. Fast, secure and Free Open Source software downloads

News

New in 0.9.0

  • Added explicit support for:
    • Apache Hadoop 1.1.2
    • CDH 4.2.0
  • Added support for Cloudera from-parcels layout (as installed by Cloudera Manager)
  • Added pydoop.hdfs.move()
  • Record writers can now be used in map-only jobs

New in 0.8.1

  • Fixed a problem that was breaking installation from PyPI via pip install

New in 0.8.0

  • Added support for Apple OS X Mountain Lion
  • Added support for Hadoop 1.1.1
  • Patches now include a fix for HDFS-829
  • Restructured docs
    • A separate tutorial section collects and expands introductory material

New in 0.7.0

  • Added Debian package

New in 0.7.0-rc3

  • Fixed a bug in the hdfs instance caching method

New in 0.7.0-rc2

  • Support for HDFS append open mode
    • fails if your Hadoop version and/or configuration does not support HDFS append

New in 0.7.0-rc1

  • Works with CDH4, with the following limitations:
    • support for MapReduce v1 only
    • CDH4 must be installed from dist-specific packages (no tarball)
  • Tested with the latest releases of other Hadoop versions
    • Apache Hadoop 0.20.2, 1.0.4
    • CDH 3u5, 4.1.2
  • Simpler build process
    • the source code we need is now included, rather than searched for at compile time
  • Pydoop scripts can now accept user-defined configuration parameters
    • New examples show how to use the new feature
  • New wrapper object makes it easier to interact with the JobConf
  • New hdfs.path functions: isdir, isfile, kind
  • HDFS: support for string description of permission modes in chmod
  • Several bug fixes

New in 0.6.6

Fixed a bug that was causing the pipes runner to incorrectly preprocess command line options.

New in 0.6.4

Fixed several bugs triggered by using a local fs as the default fs for Hadoop. This happens when you set a file: path as the value of fs.default.name in core-site.xml. For instance:

<property>
  <name>fs.default.name</name>
  <value>file:///var/hadoop/data</value>
</property>

New in 0.6.0

  • The HDFS API features new high-level tools for easier manipulation of files and directories. See the API docs for more info
  • Examples have been thoroughly revised in order to make them easier to understand and run
  • Several bugs were fixed; we also introduced a few optimizations, most notably the automatic caching of HDFS instances
  • We have pushed our code to a Git repository hosted by SourceForge. See the Installation section for instructions.

New in 0.5.0

  • Pydoop now works with Hadoop 1.0
  • Multiple versions of Hadoop can now be supported by the same installation of Pydoop. See the section on building for multiple Hadoop versions) for the details
  • We have added a command line tool to make it trivially simple to write shorts scripts for simple problems.
  • In order to work out-of-the-box, Pydoop now requires Pydoop 2.7. Python 2.6 can be used provided that you install a few additional modules (see the installation page for details).
  • We have dropped support for the 0.21 branch of Hadoop, which has been marked as unstable and unsupported by Hadoop developers.