Monday, March 22, 2010

Dir listing recursive to xml

I use this piece of code to produce an xml file with the following structure
<tabs id="0">
<dir>..\SomeDirectory\SomeSubDirectory</dir>
<file>SomeFile.name</file>
</tab>

Basically it runs from the given root directory recursively through all subdirectories. I have also included an exclude dir list for directories that are to be ignored and a file type list for the type of files (by extension) that are to be included.
NOTE: I use lxml here and use the pretty_print functionality to get the XML set out nicely in the file, but lxml does not come standard with Python distribution 2.6 so I had to do a
easy_install lxml==2.2.2
from the command prompt. Then everything worked just fine.

#! /usr/bin/env python

import os
id=0
from lxml import etree
root = etree.Element("tabs")

dirs = os.listdir("..\\")
# exclude_dirs = ['bot', 'js', 'metronome', 'MyTabs', 'PTB', 'Software', 'Tools', 'vamp2']
exclude_dirs = ['XML', 'eclipse', 'GoogleApps', 'Java']
include_files = ['.mp4', '.mov', '.pdf', '.flv', '.html']
dirs =['..\\'+filename for filename in dirs if filename.find(".") == -1 and filename not in exclude_dirs]
for x in dirs:
for proot, pdirs, pfiles in os.walk(x):
for d in pdirs:
dirs.append('\\'.join([proot, d]))
files = os.listdir('\\'.join([proot, d]))
for f in files:
for i in include_files:
if f.endswith(i) == True:
item = etree.SubElement(root, "item")
item.set('id', str(id))
id += 1
mdir = etree.SubElement(item, "dir")
mdir.text = '\\'.join([proot,d])
mfile = etree.SubElement(item, "file")
mfile.text = f

fh = open('./test.xml', 'w')
fh.write('<?xml version="1.0" encoding="ISO-8859-1"?>\n')
fh.write(etree.tostring(root, pretty_print=True))
fh.close()