|
pyxmdb.py is a free tool to eXport or iMport data from/to a database. It is written in Python (v2.5 and v2.4) and is meant to be as database neutral as possible. Anything that is DBMS specific, is stored in a user editable config file (pyxmdb.py.cfg).
If desired, you could use this to perform the 'E' and 'L' parts of an ETL (Extract, Transform, Load) process. The 'T' could then be performed by any other application or utility of your choice. This easiest way to do this, would be to have one copy of pyxmdb export the data to standard out, pipe the output to something like awk or sed, and then pipe that output to another copy of pyxmdb to import the data.
License: GNU GPL v2
To contribute new/better code, send me an email. My address is in the .py file.
My Freshmeat project page is located here, should you wish to stay updated easily.
|
Current feature list:
* import or export from/to a comma delimited text file into/from a MySQL, Oracle, PostgreSQL, Ingres or SQLite database.
* flexible logging which can be altered via a config file
* additional DBMS support can be added via a config file (no code changes required)
* import has support for auto commit after every "x" rows
* import can abort the process after "x" many errors
* export can write out a header record as the first row
* import can skip the first row if it is a header record
* export will take any valid SQL select statement and write the result to the output file
* import has some flexibility for mapping fields in the source file to columns in the destination table
* has been test on both Linux (FC7) and Windows (XP) with Python 2.4 and 2.5
* import can read from STDIN if no filename given
* export can write to STDOUT if no filename given
* --presql parameter for specifying an SQL command to run before the import/export
* --postsql parameter for specifying an SQL command to run after the import/export
* support for reading/writing compressed files (gzip and bzip2)
* using the --presqlfile, --postsqlfile, or --sqlfile; you can provide a .sql file with commands to run for the pre / post / main SQL parameters.
|
Basic usage:
./pyxmdb.py -x -f EXPORT.csv -u USER -w PASSWORD -d DBNAME -b PYTHONDBMSMODULE -q "select * from TABLE"
./pyxmdb.py -i -f IMPORT.csv -u USER -w PASSWORD -d DBNAME -b PYTHONDBMSMODULE -q "insert into TABLE values (%s)"
Where:
EXPORT.csv and IMPORT.csv are comma delimited text files (excel format)
USER and PASSWORD are the user name and password required for you to connect to the DBMS
DBNAME is the name of the database
PYTHONDBMSMODULE is one of the supported Python DBMS modules defined in pyxmdb.py.cfg
TABLE is the name of the table in your database.
Note: export will take any valid select query, including those which involve more than one table import may require use of the "insert into TABLE (col2,col3,col5) values (%s)" if IMPORT.csv doesn't match the structure of TABLE exactly.
Full list of command line paramters (by category):
-- MANDATORY --
(-i OR -x, if both specified then the last one will be used)
-i,--import = tells pyxmdb that you want to import data
-x,--export = tells pyxmdb that you want to export data
(-q/--sql OR --sqlfile, if both specified then -q/--sql will take priority)
-q,--sql = the select or insert statement to use for the export or import
--sqlfile = the file containing the select or insert statement to use for the export or import
(--dsn OR the minimal set of other database parameters required for the given DBMS. typically -u -w -d)
-s,--dsn = if the DBMS supports it, you can specify a single string with all
of the database parameters to use for connecting
-u,--dbuser = user name for the DB
-w,--dbpassword = password for the DB
-d,--dbname = name of the database to connect to
For Oracle, this is the SID, for SQLite it is the filename
(technically -f and -b are optional, but they are very highly recommended)
-f,--filename = name of the datafile to import from or export to (default is STDIN / STDOUT)
-b,--dbms = the Python DB module to load and use for the DB connection (default = sqlite3)
Note: the default can be changed in the config file pyxmdb.py.cfg in "[dbms],default"
-- OPTIONAL --
Database:
-p,--dbport = port number to use for connecting to the DB
-v,--dbserver = IP or hostname of the DB server.
-c,--commitcount = number of rows to insert before issuing a COMMIT
-e,--errorcount = number of insert exceptions to tolerate before aborting the transaction
--header = if exporting, the column names will be written out as the first line of
the file if importing, the first line of the file will be skipped
--presql parameter for specifying an SQL command to run before the import/export
--postsql parameter for specifying an SQL command to run after the import/export
--presqlfile = the file containing the SQL command to run before the export/import
--postsqlfile = the file containing the SQL command to run after the export/import
Miscelaneous:
-n,--cfgfile = fully qualified name of the config file containing the logging and
DBMS configuration
-h,--help = displays this message
--bz2 = implies that the name given for --filename is a bzip2 compressed file
--gzip = implies that the name given for --filename is a gzip compressed file
Format: (refer to the docs for Python's "csv" module for details)
--quoting = how agressive the quoting should be.
--lineterminator = what character(s) to use for a new line (default is \r\n)
--skipinitialspace = should we skip leading spaces
--doublequote = if "quotechar" appears inside of a string that is itself quoted, how should it be written False = use "escapechar", True = double up (use two "quotechar")
--escapechar = character to use when escaping a quote
--quotechar = character to use in place of a single quote (default = ')
--delimiter = character to use for separating fields/columns in the text file (default = ,)
--dialect = a csv dialect to use (default = excel)
|
v0.7
* - Minor bug fix. During the code cleanup for v0.6 I accidentally broke support for using the "--sqlfile" command line parameter during an export.
v0.6
* Major updates and improvements to the code, but no new functionality. This is partially in preparation for some new features that will be coming soon.
v0.5
* Added code to allow the use of text files containing SQL statements to be used for the pre / post / main SQL commands. Refer to the --presqlfile, --postsqlfile and --sqlfile parameters for more information.
* Minor updates and improvements to other parts of the code.
v0.4
* Added code to support the existing csv formatting parameters, and added a new "dialect" parameter.
* Added support for --presql and --postsql to run any arbitrary SQL command before / after the import or export. Both are optional, and may be used independent of each other.
* Added support for compressed files for either import or export. If --gzip given, then filename is open as a gzip compressed file. If --bz2 given, then filename is opened as a bzip2 compressed file.
* Removed some no longer required parameters surrounding logging and data directories.
* If --filename is not provided, then STDIN/STDOUT will be used for the import/export.
* Users can now choose their own default DBMS in the config file.
v0.3
* Fixed a bug with the default logging format string. Python 2.4 would generate an exception when %(funcName)s was used from the "main" part of the program. Thanks to Chris Clark for finding and reporting this bug.
* Added support for Ingres (ingresdbi) to the config file. Again, thanks go to Chris Clark for contributing this.
* Added extra exception handling to the program in several places.
* Users may now choose which DBMS module they wish to use as a default (used to be MySQLdb) in the [dbms] section of the config file. Currently set to sqlite3, as it is now included as a standard module in Python 2.5
* Removed the defaulting of dbserver=localhost, as it now interferes with the new method of config file driven DBMS connections.
v0.2
* enhanced the output of "--help"
* changed default logging output to be "INFO" level instead of debug
* few minor changes to the config file
* few miscellaneous updates to the code
|
|