McTool - move copy tool

This is the old page. The new page is here


Summary

McTool is a program for Moving and Copying files.

Since XCopy wouldn't do what I wanted, and I had been having troubles with TotalCopy running out of resources when trying to copy large numbers (millions) of files, like any good developer, instead of resorting to an existing tool (like RoboCopy), I simply re-invented the copy process. It was developed in C# and requires the Microsoft .NET 4.0 (or higher) framework.

It's designed to copy multiple (relatively small) files between high-performance networked storage devices and may not work well for other purposes. In this situation, McTool usually gets 5-10 times the performance of other copy tools (Windows Explorer, batch file copy, Xcopy, RoboCopy/RichCopy, Xxcopy, Total Copy, etc.) depending on the hardware and files being processed. Of course, it also provides features that the other tools don't (GUI interface, e-mail reports, real-time statistics, etc.).

Things McTool is not good at (or even capable of):

  • Copying a single file (or just a few files)
  • Renaming files as they are copied
  • In/excluding files based on date/time or file attributes
  • Two-way synchronization
Things it can do, but was not designed for:
  • Copying relatively large files
  • Copying a single directory of files
  • Backup an entire hard drive

Change History

Download

  • Downloads are now available from the change history page. This lets me keep previous versions online in case there are issues with new releases.

Installation

Extract the single file from the ZIP archive and copy it somewhere locally (it won't run from a network drive without modifying the default .NET permissions) and run it.

Command Line Options

  • Usage:
    • McTool.exe [\\server\share\path\settings.xml] [go]

  • Where:
    • Parameter 1 [optional] = The path to the settings file you wish to use.
    • Parameter 2 [optional] = Begin the copy. Anything will work here (literally - it doesn't matter what it is). If there is a second parameter on the command line, McTool will begin the Copy, McSync, or Move immediately after loading the settings file. Use the "Exit when finished" option in your settings file if you want it to quit when it's done processing.

  • Examples:
    • mctool d:\settings.xml go
      • The settings file [d:\settings.xml] will be loaded and the processing will begin.
    • c:\utilities\mctool.exe \\alderan\shared_music\backup_music.xml
      • The settings file [\\alderan\shared_music\backup_music.xml] will be loaded upon startup and you may modify the settings as needed before running the copy program.
    • mctool
      • Without no parameters, McTool simply runs as if you had clicked and started the program from the Windows start menu.

To make the most use out of the command line capabilities, create a settings file with the UI in McTool manually the first time and then edit/modify/use it as a template to create new files as you actually need them for your process.You may set up desktop shortcuts with settings files to perform repetitive tasks easily. You can also create batch (or script) files to automate multiple copy processes.

Filename Pattern Matching with Regular Expressions

I haven't put together any super-interesting patterns yet to show as examples, but I'm sure someone will come up with some.  Here are a couple of samples:

What to copy
RegEx pattern
All files except .JPG and .J2K files (?i)(?<!\.(jpg|j2k))$
Only .JPG or .TIF files (?i)\.(jpg|tif)$

More information about Regular Expression and pattern matching can be found here: http://www.regular-expressions.info/tutorial.html 

Tips & Tricks

A few things that may come in handy someday.

If you are MOVING files, prescanning must complete before any copying begins (otherwise things could get moved out from under the scan process). If you are copying, the prescanning occurs alongside the copying, but it's ONLY for your information - you don't NEED to prescan the files. In either case (move or copy), if speed is the most important thing, simply turn it off - you will still get a report at the end of what it actually did.

I have seen throughput speeds of up to 95 MB/sec over a Gbit network using a single instance of the tool. With certain types of files (lots of very small ones - say, 100K or less), running multiple instances on multiple machines could potentially speed things up even more because of the way SMB works (it's very chatty). SMB2 could help with this, but I don't have any networks using it to test with.

Generally, getting the application to run directly on either the source or the destination machine is better than using a "man in the middle" machine to do the work.

Threads

If you are using fast networked storage systems, you should be able to use a good number of threads. If the drives you are using aren't set up as arrays, or if they have a relatively small number of drives in the array (8 is a relatively small number of drives), sometimes using less threads can help. I have only seen less threads be helpful in a couple of cases though, usually, more works better.

McTool uses a single thread to process an entire directory. Subdirectories are handled as separate directories with their own threads. This means that if all the files are in a single directory (yikes!), it will only ever use one thread. Keeping huge numbers of files (100,000 is a huge number of files to me) in a single location causes file handling issues (slowness, memory problems, etc.) in many Windows applications so I haven't seen very many instances where this was a severe limitation.

If you are using a single external USB drive, 23 threads probably won't work well (try 5 in this case). The bottom line is that you'll want to experiment and see what works best for your situation. One thing to know when changing the number of threads while the program is running is that since each thread works on a single directory, once a thread starts running, it must finish processing that directory before it will end (watch the "number of threads running" stat counter).

Some threads are used by the program, so going too low may not allow copying to occur.

If you expand the program's window size horizontally, you can see the threads that are running. If there are significantly fewer "scan" threads than "normal" threads running, it usually means that the source system is having a hard time keeping up. The tool allocates the same number number of prescan threads as copy threads, so during this process, it can use up to twice as many threads as you have set.

2007-10-22 - I have found that McTool can overwhelm certain systems if you set the number of threads too high. This has caused at least one high-performance machine to go into a memory deprivation mode where it was constantly swapping memory to disk and was not able to service the file requests nearly as fast. I found that if I ran more than 75 simultaneous threads, copy speeds would suffer significantly. With 23 threads, it was getting about 70 MB per second average throughput on 1 Gb network segment. It peaked at over 90 MB/sec.

Copying the directory structure, but not the files

Set a bogus file pattern (something that will never match anything) and it won't copy any files, but it will create the directory structure in the destination location.

Check source size and then wait

If you're not sure that the files from the source will fit on the destination, you can set the number of threads to zero (0). It will scan the source directory(ies) and then send you an e-mail telling you how big it was, and how much space is free on the destination. You can then either increase the number of threads to let it copy, or cancel if there's not enough room.

Database Tracking

McTool can log its operations to a SQL Server database if a valid connection string is present in File | Preferences | Reporting.

Structure for the History table

      CREATE TABLE [dbo].[History](
          [ID] [bigint] IDENTITY(1,1) NOT NULL,
          [SourcePath] [varchar](8000) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
          [DestPath] [varchar](400) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
          [Status] [varchar](100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
          [Errors] [int] NULL,
          [StartTime] [datetime] NOT NULL,
          [EndTime] [datetime] NULL,
          [Files] [int] NULL,
          [FilesCopied] [int] NULL,
          [FilesSkipped] [int] NULL,
          [Bytes] [bigint] NULL,
          [BytesCopied] [bigint] NULL,
          [BytesSkipped] [bigint] NULL,
          [Dirs] [int] NULL,
          [DirsCopied] [int] NULL,
          [BytesFreeStart] [bigint] NULL,
          [BytesFreeFinish] [bigint] NULL,
          [Username] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
          [Machine] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
          [Version] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
       CONSTRAINT [PK_History] PRIMARY KEY CLUSTERED
      (
          [ID] ASC
      ) ON [PRIMARY]
      ) ON [PRIMARY]

Structure of the PathsCopied table

      CREATE TABLE [dbo].[PathsCopied](
              [ID] [bigint] IDENTITY(1,1) NOT NULL,
              [SourcePath] [varchar](400) NOT NULL,
              [DestPath] [varchar](400) NOT NULL,
              [SourceFileCount] [int] NOT NULL,
              [DestinationOriginalFileCount] [int] NULL,
              [DestinationFinalFileCount] [int] NULL,
              [EndTime] [smalldatetime] NOT NULL,
              [ParentID] [bigint] NOT NULL,
       CONSTRAINT [PK_PathsCopied] PRIMARY KEY CLUSTERED
      (
              [ID] ASC
      )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
      ) ON [PRIMARY]

Querying the database

Here are a few queries I have found to be useful:

Running jobs (condensed)

select SourcePath, DestPath, replace(replace(status,'Time Remaining (@ last/total rate):  ',''),'complete ','') as Status, 
Username, Machine, Errors, StartTime, EndTime, Files, FilesCopied, FilesSkipped, Bytes, BytesCopied, BytesSkipped, Dirs,
DirsCopied, BytesFreeStart, BytesFreeFinish, ID, Version
from History
where datediff(minute, endtime, getdate()) < 2
order by starttime desc

Current copy speed of ALL running jobs in MBPS

select sum(cast(left(right(status, len(status) - charindex('[', status)), 
charindex(' ', right(status, len(status) - charindex('[', status))) - 1) as float))
from History
where Status like '~%' and datediff(minute, endtime, getdate()) < 2

Totals from the History log

select replace(convert (varchar, convert(money, sum(FilesCopied)), 1), '.00', '') as FilesCopied, 
replace(convert (varchar, convert(money, sum(FilesSkipped)), 1), '.00', '') as FilesSkipped,
convert(varchar, convert(money, sum(BytesCopied) / 1099511627776.00), 1) + ' TB' as BytesCopied,
convert(varchar, convert(money, sum(BytesSkipped) / 1099511627776.00), 1) + ' TB'  as BytesSkipped,
replace(convert (varchar, convert(money, sum(DirsCopied)), 1), '.00', '') as DirsCopied
from History

Misc

select * from History where destpath like '%gbrmil1914r4pt2%'
select * from History where sourcepath like '%como%' order by starttime
select top 1000 * from PathsCopied where SourcePath like '%bedfordshire%'
select * from PathsCopied where endtime > '2008-09-01' and sourcefilecount <> destinationfinalfilecount

Enhancements

If you think of other things that could be included in the program (or might be worth considering), or if it doesn't work for what you need to do, let me know. On the list or done already (strikeout items are done):

  • multithreading to speed up copies from multiple directories
  • command line compatibility
  • configuration file saving
  • better/more statistics display
  • sliding window for throughput measurement
  • validation on directory path strings
  • regular expression filename matching
  • multiple output directories (how would this work with move?)
  • MD5 (or other hash) file comparison option
  • speed throttling (isn't it designed to go as fast as possible?)
  • better handling for low and out of disk space problems
  • recover from serious hardware/network issues (rebooted machine while a copy was in progress, lost Remote Desktop session, fatal program error, etc.)
  • ability to skip and/or abort the scan step (if you know it's going to fit and don't care about the related stats)
  • option to do a byte comparison before copying if the file sizes are equal (date/time ignored)
  • exit when finished processing
  • e-mail online once an hour for low disk space
  • Disk space checking on output
  • warn/pause when low disk space
  • return errorlevel with command line version
  • pause
  • exclude files - file based list of entries (including regular expressions?)
  • file based include list (with RegEx?)
  • copy NTFS permissions
  • Add MD5 in addition to SHA1