If I want to carry my music collection around with me, the only two companies that offer devices with a large enough capacity are Apple with its iPod products and Microsoft with its Zune products. Neither set of devices support audio formats such as ogg or flac.

Certain generations of iPods can run the Rockbox operating system which can play these and many other formats. However, Rockbox’s support for 3rd party accessories using the Apple Accessory Protocol is very limited. If I wanted to use an iPod with my car’s in-dash navigation unit and have full functionality to browse and shuffle songs, I’d need to use the stock iPod operating system.

I’ve used software such as Banshee and Floola before to convert my audio collection during the syncing process. Both programs are open source and work fairly well, but the conversion process can take a long time: up to three days to resync an iPod from scratch.

Floola’s conversion process is single threaded, so it will only convert one song at a time even on a muilti-processor system. Banshee is very buggy and will sometimes fail to update the iPod’s music database, leaving all the auto copied to the iPod but inaccessible from the user interface. Banshee also is limited to only MP3 support and cannot convert audio to aac.

So I decided that the solution I’d take is to have two copies of my music collection. I’d keep one copy in iTunes compatible formats, and use that copy to sync with my iPod. I’d keep my originals in the formats I currently have, which is a mix of mp3s, wmas, oggs and flacs.

The following script will copy one music tree to a new folder, leaving formats such as mp3 and wma untouched, while converting all my ogg and flac files into aac files using Nero’s AAC codec for Linux (Note: although Nero’s AAC codec is free, it is not open source). The script is written in python2 and uses flac and vorbis-tools for decoding.

#!/usr/bin/env python
#
#  eyePodify.py - version 0.7
#
# A script for recursively converting a folder of audio files into iPod compatible formats. 
# All flac and ogg files are transcoded to aac. All other, including non-audio, files are 
# copied unmodified to the destination. Files that already exist in the destination tree 
# will be skipped.
#
# For Usage:
#  ./eyePodify.py --help
#
# Dependencies: 
#    python       >= 2.7
#    vorbis-tools >= 1.1 
#    flac         >= 1.2  
#    neroAac      >= 1.5 (Note: Free but Closed Source)
#                         http://www.nero.com.tw/eng/downloads-nerodigital-nero-aac-codec.php
#
# Copyright 2010 Sumit Khanna. Free for non-commercial use. http://penguindreams.org

import sys
import os
import subprocess
import tempfile
import logging
import shutil
import multiprocessing
import threading
import datetime
from optparse import OptionParser,OptionGroup

def scanSourceTree(src,dest,encRate,numProcs,testRun,customMeta):

   tSem = threading.Semaphore(numProcs)
   logging.debug('Maximum number of threads %d' % numProcs)

   for root, dirs, files in os.walk(src):
     for f in files:
        orig = os.path.join(root,f)
	new  = os.path.join(dest,root[len(src)+1:],f)
	basename,extension = os.path.splitext(new)
	if incompatiableFormat(extension):
	  newtran = basename + '.m4a'
	  if not os.path.exists(newtran):
	    logging.info( 'transcoding %s to %s' % (f,newtran) )
            if not testRun:
	      tCode = Transcoder(orig,newtran,encRate,customMeta,tSem)
	      tSem.acquire()
	      tCode.start()
	  else:
	    logging.debug('transcoded file exists. skipping %s' % newtran )
	else:
	  if not os.path.exists(new):
	    logging.info( 'copying %s to %s' % (f,new) )
            if not testRun:
	      ensure_dir(new)
	      shutil.copy2(orig,new)
	  else:
	    logging.debug('file exists. skipping %s' % (f) )


class Transcoder(threading.Thread):

  def __init__(self,src,dest,encRate,customMeta,tSem):
    super(Transcoder,self).__init__()
    self.src = src
    self.dest = dest
    self.encRate = encRate
    self.semaphore = tSem
    self.customMeta = customMeta

  def runprocess(self,args):
   logging.debug('running command: ' + ' '.join(args) )
   proc = subprocess.Popen(args,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
   output,error = proc.communicate()
   status = proc.returncode 
   oot = 'Output:\n%s\nError:\n%s\n' % (output,error)
   return oot,status

  def pullMetaData(self,src,codec):
    output = None
    status = None
    data = []

    logging.debug('parsing %s metadata for %s' % (codec,src))

    if codec == 'ogg':
      output,status = self.runprocess(['ogginfo',src])
      for dataLine in output.splitlines():
	line = dataLine.split('=')
	if len(line) == 2:
	  data.append(line)
    elif codec == 'flac':
      output,status = self.runprocess(['metaflac','--list','--block-type=VORBIS_COMMENT',src])
      for dataLine in output.splitlines():
	line = dataLine.strip().split(':')
	if line[0].startswith('comment[') :
	  data.append(line[1].strip().split('='))

    if(status != 0):
       logging.warn('%s metadata extractor returned %d. possible corrupt %s' % (codec,status,codec))

    return data


  def writeMetaData(self,dest,data):
     metaCmd = ['neroAacTag',dest]
     for meta in data:
       aacTag = vorbisMetaToAAC(meta[0].lower().strip())
       if aacTag is not None:
	  metaCmd.append('-meta:%s=%s' % (aacTag,meta[1].strip()))
	  logging.debug('pulled standard-meta %s = %s' % (aacTag,meta[1]))
       else:
          if self.customMeta == True:
	    metaCmd.append('-meta-user:%s=%s' % (meta[0].lower().strip(),meta[1].strip()))
	    logging.debug('pulled custom-meta %s = %s' % (meta[0].lower(),meta[1]))
          else:
            logging.debug('ignoring custom-meta %s = %s' % (meta[0].lower(),meta[1]))

     output,status = self.runprocess(metaCmd)
     if status != 0:
	logging.warn('meta tagger exited with error %d: %s' % (status,output))


  def run(self):  
   src = self.src
   dest = self.dest
   encRate = self.encRate

   logging.debug('source: %s\ndest: %s' % (src,dest))

   srcHead,srcFile = os.path.split(src)
   dstName,dstExt = os.path.splitext(srcFile)
   dstNew = dstName + '.m4a'
   codec = dstExt.lower().strip('.')

   logging.info('transcoding (wav) %s' % (dstName))

   #create temp file
   interWav = tempfile.NamedTemporaryFile(delete=False)
   tmpName = interWav.name
   interWav.close()
   logging.debug('created tmp wav file %s' % tmpName)

   #convert MP3 to Wave
   #output,status = self.runprocess(['mplayer','-vc','null','-vo','null','-ao','pcm:fast','-ao','pcm:file='+tmpName,src])
   if codec == 'ogg':
      output,status = self.runprocess(['oggdec','-o',tmpName,src])
   elif codec == 'flac':
      output,status = self.runprocess(['flac','-d','-F','-f','-o',tmpName,src])

   if status == 0:
     logging.info('transcoding (aac) %s' % dstNew)

     ensure_dir(dest)

     #transcode Wave to AAC with 2 pass Nero Encoder
     rate = '%s' % (encRate * 1000)
     output,status = self.runprocess(['neroAacEnc','-br',rate,'-2pass','-if',tmpName,'-of',dest])

     if(status == 0):
        logging.info('transcoding %s complete. copying metadata.' % dstNew)

        #copy metadata
        data = self.pullMetaData(src,codec)
        self.writeMetaData(dest,data)

     else:
        logging.error('transcoding (aac) unsuccessful. error code: %d.\n\noutput\n%s' % (status,output))
   else:
     logging.error('transcoding (WAV) unsuccessful. error code: %d.\n\noutput\n%s' % (status,output))

   #delete tmp file
   if os.path.isfile(tmpName):
     logging.debug('removing tmp file %s' % tmpName)
     os.unlink(tmpName)
  
   self.semaphore.release()

def vorbisMetaToAAC(tag):
   if tag == 'date':
      return 'year'
   elif tag == 'tracknumber':
      return 'track'
   elif tag == 'tracktotal':
      return 'totaltracks'
   elif tag == 'discnumber':
      return 'disc'
   elif tag == 'title' or tag == 'artist' or tag == 'genre' or tag == 'album':
      return tag
   else:
      return None
   
def incompatiableFormat(ext):
   if ext == '.ogg' or ext == '.flac':
     return True
   else:
     return False

def ensure_dir(f):
   logging.debug('checking directory %s' % f)
   d = os.path.dirname(f)
   if not os.path.exists(d):
     logging.debug('creating directory %s' % d)
     os.makedirs(d)

def logfile_arg():
   def func(option,opt_str,value,parser):
      if parser.rargs and not parser.rargs[0].startswith('-'):
         val=parser.rargs[0]
         parser.rargs.pop(0)
      else:
         #defaults to program_name_YYYY-MM-DD_HHMMSS.log
         val = sys.argv[0] + '_' + datetime.datetime.now().strftime('%Y-%m-%d_%H%M%S') + '.log'
      setattr(parser.values,option.dest,val)
   return func

if __name__ == "__main__":

   parser = OptionParser(usage="%prog [-dt] [-b (bitrate)] [-l logfile] <source-tree> <destination-tree>",
                         description="A script for recursively converting a folder of audio files into iPod compatible formats. All flac and ogg files are transcoded to aac. All other, including non-audio, files are copied unmodified to the destination. Files that already exist in the destination tree will be skipped.\n", version="%prog 0.7", epilog='Copyright 2010 Sumit Khanna. Free for non-commercial use. PenguinDreams.org')
   parser.add_option('-d','--debug',action='store_true',help='show additional debugging output')
   parser.add_option('-l','--logfile',action='callback',callback=logfile_arg(),help='store output to logfile [default: %s_yyyy-mm-dd-hhmmss.log]' % sys.argv[0],metavar='FILE',dest='logfile')
   parser.set_defaults(verbose=True)
   parser.add_option('-t','--test',action='store_true',help='test run (no files copied/encoded)')
   parser.add_option('-v', action='store_true', dest='verbose', help='verbose output (default, combine with -d for additional information)')
   parser.add_option('-q', action='store_false', dest='verbose', help='run silent')

   encoderOpts = OptionGroup(parser,'Encoding Options')
   encoderOpts.add_option('-b',type='int',help='target bitrate for aac in kbps [default: %default]',metavar='BITRATE',default=192)
   encoderOpts.add_option('-j',type='int',help='number of encoder processes to launch. [defaults to one plus the total number of CPUs (currently: %default)]',metavar="PROCS",default=multiprocessing.cpu_count() + 1)
   encoderOpts.add_option('-c','--custom-meta',action='store_true',dest='customMeta',help='Copy non-standard AAC meta data')
   parser.add_option_group(encoderOpts)

   (options, args) = parser.parse_args()

   if len(args) != 2:
      parser.error('You must specify a source and destination tree')
   elif (not os.path.isdir(args[0])) or (not os.path.isdir(args[1])):
      parser.error('Source and destinations must be directories')
   else:

      #-d option
      if options.debug == True:
        logging.getLogger('').setLevel(logging.DEBUG)
      else:
        logging.getLogger('').setLevel(logging.INFO)

      #logger setup
      if options.verbose is True:
        console = logging.StreamHandler()
	console.setFormatter(logging.Formatter('%(asctime)s: %(message)s'))
        logging.getLogger('').addHandler(console)
   
      if options.logfile is not None:
        logfile = logging.FileHandler(options.logfile)
        logfile.setFormatter(logging.Formatter('%(asctime)s %(levelname)-8s %(message)s'))
        logging.getLogger('').addHandler(logfile)             

      if options.test:
        logging.info('Test Run! -- No files will actually be copied or reencoded')

      logging.debug('Options: %s' % options)
      logging.debug('Arguments: %s' % args)

      #begin recursive scan
      scanSourceTree(args[0],args[1],options.b,options.j,options.test,options.customMeta)