Contact Me

9178days until
"Retirement"

McTool - move copy tool

Note

I am now developing this tool at work, so it is no longer available here on my web site. I am however leaving this page up because I know that once something has been posted online, there's no hope of removing it completely. This way, no one will be confused about what has happened to the project...

If you are interested in the tool, please contact me directly.

Summary

McTool is a program for Moving and Copying files.

Since XCopy wouldn't do what I wanted, and I had been having troubles with TotalCopy running out of resources when trying to copy large numbers (millions) of files, like any good developer, instead of resorting to an existing tool (like RoboCopy), I simply re-invented the copy process.   It was developed in C# and requires the Microsoft .NET 2.0  (or higher) framework.

It's designed to copy multiple (relatively small) files between high-performance networked storage devices and may not work well for other purposes. In this situation, McTool usually gets 5-10 times the performance of other copy tools (Windows Explorer, batch file copy, xcopy, Robocopy, Xxcopy, Total Copy,  etc.) depending on the hardware and files being processed. Of course, it also provides features that the other tools don't (GUI interface, e-mail reports, real-time statistics, etc.).

Things McTool is not good at (or even capable of):

  • Copying a single file (or just a few files)
  • Renaming files as they are copied
  • In/excluding files based on date/time or file attributes
  • Two-way synchronization
Things it can do, but was not designed for:
  • Copying relatively large files
  • Copying a single directory of files
  • Backup an entire hard drive

Installation

Copy the files somewhere locally (it won't run from a network drive without modifying the .NET permissions) and run it.

Command Line Options

  • Usage:
    • McTool.exe [\\server\share\path\settings.xml] [go]

  • Where:
    • Parameter 1 [optional] = The path to the settings file you wish to use.
    • Parameter 2 [optional] = Begin the copy. Anything will work here (literally - it doesn't matter what it is). If there is a second parameter on the command line, McTool will begin the Copy, McSync, or Move immediately after loading the settings file. Use the "Exit when finished" option in your settings file if you want it to quit when it's done processing.

Without any parameters, McTool simply runs as if you had clicked and started the program from the Windows start menu.

To make the most use out of this capability, create a settings file with the UI in McTool manually the first time and then edit/modify/use it as a template to create a new file as you actually need it for your process.

Filename Pattern Matching with Regular Expressions

I haven't put together any super-interesting patterns yet to show as examples, but I'm sure someone will come up with some.  Here are a couple of samples:

What to copy
RegEx pattern
All files except .JPG and .J2K files (?i)(?<!\.(jpg|j2k))$
Only .JPG or .TIF files (?i)\.(jpg|tif)$

More information about Regular Expression and pattern matching can be found here: http://www.regular-expressions.info/tutorial.html 

Tips & Tricks

A few things that may come in handy someday.

Threads

Always fiddle with the number of threads you use. If you are copying to a high-speed networked attached storage device the default value of 23 threads may work out well. If you are using an external USB drive, it probably won't (try 3 in this case). The bottom line is that you'll want to experiment and see what works best for your situation. One thing to know when changing the number of threads while the program is running is that since each thread works on a single directory, once a thread starts running, it must finish processing that directory before it will end (watch the "number of threads running" stat counter).

2007-10-22 - I have found that McTool can overwhelm certain systems if you set the number of threads too high. This has caused at least one high-performance machine to go into a memory deprivation mode where it was constantly swapping memory to disk and was not able to service the file requests nearly as fast. I found that if I ran more than 75 simultaneous threads, copy speeds would suffer significantly. With 23 threads, it was getting about 70 MB per second average throughput on 1 Gb network segment. It peaked at over 90 MB/sec.

Copying the directory structure, but not the files

Set a bogus file pattern and it won't copy any files, but it will create the directory structure.

Check source size and then wait

If you're not sure that the files from the source will fit on the desintation, you can set the number of threads to zero (0). It will scan the source directory(ies) and then send you an e-mail telling you how big it was, and how much space is free on the destination. You can then either increase the number of threads to let it copy, or cancel if there's not enough room.

Database Tracking

McTool can log its operations to a database if a valid connection string is present in File | Preferences | Reporting.

Structure for the History table

      CREATE TABLE [dbo].[History](
          [ID] [bigint] IDENTITY(1,1) NOT NULL,
          [SourcePath] [varchar](8000) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
          [DestPath] [varchar](400) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
          [Status] [varchar](100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
          [Errors] [int] NULL,
          [StartTime] [datetime] NOT NULL,
          [EndTime] [datetime] NULL,
          [Files] [int] NULL,
          [FilesCopied] [int] NULL,
          [FilesSkipped] [int] NULL,
          [Bytes] [bigint] NULL,
          [BytesCopied] [bigint] NULL,
          [BytesSkipped] [bigint] NULL,
          [Dirs] [int] NULL,
          [DirsCopied] [int] NULL,
          [BytesFreeStart] [bigint] NULL,
          [BytesFreeFinish] [bigint] NULL,
          [Username] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
          [Machine] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
          [Version] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
       CONSTRAINT [PK_History] PRIMARY KEY CLUSTERED
      (
          [ID] ASC
      ) ON [PRIMARY]
      ) ON [PRIMARY]

Structure of the PathsCopied table

      CREATE TABLE [dbo].[PathsCopied](
              [ID] [bigint] IDENTITY(1,1) NOT NULL,
              [SourcePath] [varchar](400) NOT NULL,
              [DestPath] [varchar](400) NOT NULL,
              [SourceFileCount] [int] NOT NULL,
              [DestinationOriginalFileCount] [int] NULL,
              [DestinationFinalFileCount] [int] NULL,
              [EndTime] [smalldatetime] NOT NULL,
              [ParentID] [bigint] NOT NULL,
       CONSTRAINT [PK_PathsCopied] PRIMARY KEY CLUSTERED
      (
              [ID] ASC
      )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
      ) ON [PRIMARY]

Querying the database

Here are a few queries I have found to be useful:

Running jobs (condensed)

select SourcePath, DestPath, replace(replace(status,'Time Remaining (@ last/total rate):  ',''),'complete ','') as Status, 
Username, Machine, Errors, StartTime, EndTime, Files, FilesCopied, FilesSkipped, Bytes, BytesCopied, BytesSkipped, Dirs,
DirsCopied, BytesFreeStart, BytesFreeFinish, ID, Version
from History
where datediff(minute, endtime, getdate()) < 2
order by starttime desc

Current copy speed of ALL running jobs in MBPS

select sum(cast(left(right(status, len(status) - charindex('[', status)), 
charindex(' ', right(status, len(status) - charindex('[', status))) - 1) as float))
from History
where Status like '~%' and datediff(minute, endtime, getdate()) < 2

Totals from the History log

select replace(convert (varchar, convert(money, sum(FilesCopied)), 1), '.00', '') as FilesCopied, 
replace(convert (varchar, convert(money, sum(FilesSkipped)), 1), '.00', '') as FilesSkipped,
convert(varchar, convert(money, sum(BytesCopied) / 1099511627776.00), 1) + ' TB' as BytesCopied,
convert(varchar, convert(money, sum(BytesSkipped) / 1099511627776.00), 1) + ' TB'  as BytesSkipped,
replace(convert (varchar, convert(money, sum(DirsCopied)), 1), '.00', '') as DirsCopied
from History

Misc

select * from History where destpath like '%gbrmil1914r4pt2%'
select * from History where sourcepath like '%como%' order by starttime
select top 1000 * from PathsCopied where SourcePath like '%bedfordshire%'
select * from PathsCopied where endtime > '2008-09-01' and sourcefilecount <> destinationfinalfilecount

Enhancements

If you think of other things that could be included in the program (or might be worth considering), or if it doesn't work for what you need to do, let me know. On the list or done already (strikeout items are done):

  • multithreading to speed up copies from multiple directories
  • command line compatibility
  • configuration file saving
  • better/more statistics display
  • sliding window for throughput measurement
  • validation on directory path strings
  • regular expression filename matching
  • multiple output directories (how would this work with move?)
  • MD5 (or other hash) file comparison option
  • speed throttling (isn't it designed to go as fast as possible?)
  • better handling for low and out of disk space problems
  • recover from serious hardware/network issues (rebooted machine while a copy was in progress, lost Remote Desktop session, fatal program error, etc.)
  • ability to skip and/or abort the scan step (if you know it's going to fit and don't care about the related stats)
  • option to do a byte comparison before copying if the file sizes are equal (date/time ignored)
  • exit when finished processing
  • e-mail online once an hour for low disk space
  • Disk space checking on output
  • warn/pause when low disk space
  • return errorlevel with command line version
  • pause
  • exclude files - file based list of entries (including regular expressions?)
  • file based include list (with RegEx?)
  • copy NTFS permissions

Change History

2009-04-16 ver. 2.6.0.31

  • Fix problem with "Skip prescan" checkbox getting disabled after 10 seconds.
  • Disable conflicting options when "Force Copy" is selected.
  • Changed a directory error to be just a warning until retries expire.

2009-03-24 ver. 2.6.0.16

  • Handle millsecond date/time problem on some filesystems.
  • Slow down progress bar when things are going fast.
  • Reformat e-mail with two spaces at beginning of each line so Outlook won't try to unwrap it.

2009-02-26 ver. 2.6.0.12

  • Better color handling of progress bar.
  • Better handling of Skip Prescan check box when running.
  • Faster checking of source directories. Only checks once instead of going through the full retry/wait cycle.
  • Better error handling for compare failures.

2009-02-23 ver. 2.6.0.0

  • Added ability to skip the Prescan either before or during program execution.
2009-01-16 ver. 2.5.1.84
  • Added ability to compare the files after the Move/Copy. Uses SHA1 hash algorithm.
  • Better UI reset.
2009-01-10 ver. 2.5.1.82
  • Minor timing changes between prescan and copy threads.
  • Better testing for thread completion.
  • Minor log file format update.
2009-01-06 ver. 2.5.1.78
  • Added tracking for when processing is paused. Pausing will no longer adversly affect the throughput stats.
  • Much better error handling and reporting under adverse circumstances (network troubles, etc.).
  • Misc other enhancements.
2008-11-21 ver. 2.5.1.52
  • Re-enable changing thresholds while running.
  • Misc. minor changes.

2008-10-25 ver. 2.5.1.47

  • Don’t require a destination directory when using “None” as the Copy Method (like when just checking the size of the source files).
  • If the number of threads is set to zero (0) when the prescan finishes, a status report will be e-mailed. Increasing the threads will then allow processing to continue.

2008-10-20 ver. 2.5.1.42

  • Changed low space e-mails to only get sent once (don't ignore them anymore!).

2008-10-15 ver. 2.5.1.40

  • Added more stats to log file (directory handling method, file counts in front of file sizes).

2008-10-03 ver. 2.5.1.34

  • Better error reporting 

06/12/2008   2.5.23.13

  • Pause will now pause everything, not just copy threads.
  • Canceling works better.
  • A little UI reorg. to make it a bit smaller and show a few more stats.
  • Other misc. enhancements/fixes.

 01/04/2008   2.4.2.4

  • Drag and drop on the Source path now appends to whatever is already there. Formatted thread display window better.
  • Added check for write permissions and/or low disk space on destination. Fixed problem with "can't access directory" messages appearing when canceling a process.
  • Make sure the destination directory is there before writing the free space test file to it.
  • Added Pause feature. Added disk space thresholds for Low and Minimum disk space on the destination. It will send an e-mail (if e-mailing is enabled) when those thresholds are reached. If it hits the minimum threshold, it will automatically pause move/copy processing.
  • Fixed low disk space e-mails from being repeated. Added disk space remaining to summary report.
  • Added e-mail settings (file | preferences)
Date
Version Description
12/13/2007 2.4.0.1 Changed filename from wildcard to Regular Expressions.
12/13/2007 2.4.0.0 Added ability to select date/time compare options: Newer, Different, Ignore
12/12/2007 2.3.0.1 Fixed files being left in the source if they already existed in the destination during moves.
12/11/2007 2.3.0.0 Changed Date/Time compare handling. Files will now only be copied if they are NEWER.
12/9/2007 2.2.1.29 New logging class - no difference externally, but now the class is available for other projects.
11/28/2007 2.2.1.26 Update some more error messages.
11/12/2007 2.2.1.20 More accurate time remaining. Better retry on error handling. Update some error messages.
11/5/2007 2.2.1.10 Carriage return in source path with only 1 path caused exception while sending e-mail message.
11/1/2007 2.2.1.9 Added error handling for importing/exporting settings file.
10/29/2007 2.2.1.8 More efficient sharing of scanning and copying threads.
10/24/2007 2.2.1.4 Additional completion time estimate based on last throughput measurement.
10/22/2007 2.2.1.0 Rearranged UI to use screen better. Removed tabs and a couple other items.
10/20/2007 2.2.0.0 Reworked threading to make it follow the trees from top to bottom.
10/18/2007 2.1.0.27 Added a "threads running" window - drag the main window wider horizontally to see it.
10/16/2007 2.1.0.24 Configuration file saving. Command line version.
10/13/2007 2.1.0.20 More consistent error reporting.
10/10/2007 2.1.0.15 Current BPS stats (sliding window).
10/8/2007 2.1.0.0 Fixes to multithreading for deep directories.
10/1/2007 2.0.0.0 Multithreading.
9/20/2007 1.0.4.31 More stats. Added Drag and Drop to the paths fields.
2/5/2007 1.0.4.0 One-way syncing option added (McSync)
2/1/2007 1.0.3.0 Now handles multiple source paths.
1/30/2007 1.0.2.0 Wildcard handling.
1/17/2007 1.0.1.0 Added delete tree functionality on the delete tab.
11/16/2006 1.0.0.12 Added options to set the number retries and time between them on file/directory operations failure.
11/16/2006 1.0.0.10 Added cancel button.
11/14/2006 1.0.0.7 Initial release.