Note
I am now developing this tool at work, so it is no longer available here on my web site. I am however leaving this page up because I know that once something has been posted online, there's no hope of removing it completely. This way, no one will be confused about what has happened to the project...
If you are interested in the tool, please contact me directly.
Summary
McTool is a program for Moving and Copying files.
Since XCopy wouldn't do what I wanted, and I had been having troubles with TotalCopy running
out of resources when trying to copy large numbers (millions) of files,
like any good developer, instead of resorting to an existing tool (like RoboCopy), I simply re-invented the copy process. It was developed in C# and requires the Microsoft .NET 2.0 (or higher) framework.
It's designed to copy multiple (relatively
small) files between high-performance networked storage devices and may not work well for
other purposes. In this situation, McTool usually gets 5-10 times the performance of other copy tools
(Windows Explorer, batch file copy, xcopy, Robocopy, Xxcopy, Total
Copy, etc.) depending on the hardware and files being processed. Of
course, it also provides features that the other tools don't (GUI
interface, e-mail reports, real-time statistics, etc.).
Things McTool is not good at (or even capable of):
- Copying a single file (or just a few files)
- Renaming files as they are copied
- In/excluding files based on date/time or file attributes
- Two-way synchronization
Things it can do, but was not designed for:
- Copying relatively large files
- Copying a single directory of files
- Backup an entire hard drive
Installation
Copy the files somewhere locally (it won't run from a network drive without modifying the .NET permissions) and run it.
Command Line Options
- Usage:
- Where:
- Parameter 1 [optional] = The path to the settings file you wish to use.
-
Parameter 2
[optional] = Begin the copy. Anything will work here (literally - it
doesn't matter what it is). If there is a second parameter on the
command line, McTool will begin the Copy, McSync, or Move immediately after loading the settings file. Use the "Exit when finished" option in your settings file if you want it to quit when it's done processing.
Without any parameters, McTool simply runs as if you had clicked and started the program from the Windows start menu.
To make the most use out of this capability,
create a settings file with the UI in McTool manually the first time
and then edit/modify/use it as a template to create a new file as you
actually need it for your process.
Filename Pattern Matching with Regular Expressions
I haven't put together any super-interesting
patterns yet to show as examples, but I'm sure someone will come up
with some. Here are a couple of samples:
What to copy
|
RegEx pattern
|
| All files except .JPG and .J2K files |
(?i)(?<!\.(jpg|j2k))$ |
| Only .JPG or .TIF files |
(?i)\.(jpg|tif)$ |
More information about Regular Expression and pattern matching can be found here: http://www.regular-expressions.info/tutorial.html
Tips & Tricks
A few things that may come in handy someday.
Threads
Always fiddle with the number
of threads you use. If you are copying to a high-speed networked
attached storage device the default value of 23 threads may work out
well. If you are using an external USB drive, it probably won't (try 3
in this case). The bottom line is that you'll want to experiment and
see what works best for your situation. One thing to know when changing
the number of threads while the program is running is that since each
thread works on a single directory, once a thread starts running, it
must finish processing that directory before it will end (watch the
"number of threads running" stat counter).
2007-10-22 - I have found that McTool can overwhelm
certain systems if you set the number of threads too
high. This has caused at least one high-performance machine to go into
a memory deprivation mode where it was constantly swapping memory to
disk and was not able to service the file requests nearly as fast. I
found that if I ran more than 75 simultaneous threads, copy speeds
would suffer significantly. With 23 threads, it was getting about 70
MB per second average throughput on 1 Gb network segment. It peaked at
over 90 MB/sec.
Copying the directory structure, but not the files
Set a bogus file pattern and it won't copy any files, but it will create the directory structure.
Check source size and then wait
If
you're not sure that the files from the source will fit on the
desintation, you can set the number of threads to zero (0). It will scan
the source directory(ies) and then send you an e-mail telling you how
big it was, and how much space is free on the destination. You can then
either increase the number of threads to let it copy, or cancel if
there's not enough room.
Database Tracking
McTool can log its operations to a database if a valid connection string is present in File | Preferences | Reporting.
Structure for the History table
CREATE TABLE [dbo].[History](
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[SourcePath] [varchar](8000) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[DestPath] [varchar](400) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[Status] [varchar](100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Errors] [int] NULL,
[StartTime] [datetime] NOT NULL,
[EndTime] [datetime] NULL,
[Files] [int] NULL,
[FilesCopied] [int] NULL,
[FilesSkipped] [int] NULL,
[Bytes] [bigint] NULL,
[BytesCopied] [bigint] NULL,
[BytesSkipped] [bigint] NULL,
[Dirs] [int] NULL,
[DirsCopied] [int] NULL,
[BytesFreeStart] [bigint] NULL,
[BytesFreeFinish] [bigint] NULL,
[Username] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Machine] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[Version] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
CONSTRAINT [PK_History] PRIMARY KEY CLUSTERED
(
[ID] ASC
) ON [PRIMARY]
) ON [PRIMARY]
Structure of the PathsCopied table
CREATE TABLE [dbo].[PathsCopied](
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[SourcePath] [varchar](400) NOT NULL,
[DestPath] [varchar](400) NOT NULL,
[SourceFileCount] [int] NOT NULL,
[DestinationOriginalFileCount] [int] NULL,
[DestinationFinalFileCount] [int] NULL,
[EndTime] [smalldatetime] NOT NULL,
[ParentID] [bigint] NOT NULL,
CONSTRAINT [PK_PathsCopied] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY
= OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Querying the database
Here are a few queries I have found to be useful:
Running jobs (condensed)
select SourcePath, DestPath, replace(replace(status,'Time Remaining (@ last/total rate): ',''),'complete ','') as Status,
Username, Machine, Errors, StartTime, EndTime, Files, FilesCopied, FilesSkipped, Bytes, BytesCopied, BytesSkipped, Dirs,
DirsCopied, BytesFreeStart, BytesFreeFinish, ID, Version
from History
where datediff(minute, endtime, getdate()) < 2
order by starttime desc
Current copy speed of ALL running jobs in MBPS
select sum(cast(left(right(status, len(status) - charindex('[', status)),
charindex(' ', right(status, len(status) - charindex('[', status))) - 1) as float))
from History
where Status like '~%' and datediff(minute, endtime, getdate()) < 2
Totals from the History log
select replace(convert (varchar, convert(money, sum(FilesCopied)), 1), '.00', '') as FilesCopied,
replace(convert (varchar, convert(money, sum(FilesSkipped)), 1), '.00', '') as FilesSkipped,
convert(varchar, convert(money, sum(BytesCopied) / 1099511627776.00), 1) + ' TB' as BytesCopied,
convert(varchar, convert(money, sum(BytesSkipped) / 1099511627776.00), 1) + ' TB' as BytesSkipped,
replace(convert (varchar, convert(money, sum(DirsCopied)), 1), '.00', '') as DirsCopied
from History
Misc
select * from History where destpath like '%gbrmil1914r4pt2%'
select * from History where sourcepath like '%como%' order by starttime
select top 1000 * from PathsCopied where SourcePath like '%bedfordshire%'
select * from PathsCopied where endtime > '2008-09-01' and sourcefilecount <> destinationfinalfilecount
Enhancements
If you think of other things that could be included
in the program (or might be worth considering), or if it doesn't work
for what you need to do, let me know. On the list or done already (strikeout items are done):
multithreading to speed up copies from multiple directoriescommand line compatibilityconfiguration file savingbetter/more statistics displaysliding window for throughput measurementvalidation on directory path stringsregular expression filename matching- multiple output directories (how would this work with move?)
MD5 (or other hash) file comparison option- speed throttling (isn't it designed to go as fast as possible?)
better handling for low and out of disk space problemsrecover from serious hardware/network issues (rebooted
machine while a copy was in progress, lost Remote Desktop session,
fatal program error, etc.)ability to skip and/or abort the scan step (if you know it's going to fit and don't care about the related stats)- option to do a byte comparison before copying if the file sizes are equal (date/time ignored)
exit when finished processinge-mail online once an hour for low disk spaceDisk space checking on outputwarn/pause when low disk space- return errorlevel with command line version
pause- exclude files - file based list of entries (including regular expressions?)
- file based include list (with RegEx?)
- copy NTFS permissions
Change History
2009-04-16 ver. 2.6.0.31
- Fix problem with "Skip prescan" checkbox getting disabled after 10 seconds.
- Disable conflicting options when "Force Copy" is selected.
- Changed a directory error to be just a warning until retries expire.
2009-03-24 ver. 2.6.0.16
- Handle millsecond date/time problem on some filesystems.
- Slow down progress bar when things are going fast.
- Reformat e-mail with two spaces at beginning of each line so Outlook won't try to unwrap it.
2009-02-26 ver. 2.6.0.12
- Better color handling of progress bar.
- Better handling of Skip Prescan check box when running.
- Faster checking of source directories. Only checks once instead of going through the full retry/wait cycle.
- Better error handling for compare failures.
2009-02-23 ver. 2.6.0.0
- Added ability to skip the Prescan either before or during program execution.
2009-01-16 ver. 2.5.1.84
- Added ability to compare the files after the Move/Copy. Uses SHA1 hash algorithm.
- Better UI reset.
2009-01-10 ver. 2.5.1.82
- Minor timing changes between prescan and copy threads.
- Better testing for thread completion.
- Minor log file format update.
2009-01-06 ver. 2.5.1.78
- Added tracking for when processing is paused. Pausing will no longer adversly affect the throughput stats.
- Much better error handling and reporting under adverse circumstances (network troubles, etc.).
- Misc other enhancements.
2008-11-21 ver. 2.5.1.52
- Re-enable changing thresholds while running.
- Misc. minor changes.
2008-10-25 ver. 2.5.1.47
- Don’t
require a destination directory when using “None” as the Copy Method
(like when just checking the size of the source files).
- If the
number of threads is set to zero (0) when the prescan finishes, a
status report will be e-mailed. Increasing the threads will then allow
processing to continue.
2008-10-20 ver. 2.5.1.42
- Changed low space e-mails to only get sent once (don't ignore them anymore!).
2008-10-15 ver. 2.5.1.40
- Added more stats to log file (directory handling method, file counts in front of file sizes).
2008-10-03 ver. 2.5.1.34
06/12/2008 2.5.23.13
- Pause will now pause everything, not just copy threads.
- Canceling works better.
- A little UI reorg. to make it a bit smaller and show a few more stats.
- Other misc. enhancements/fixes.
01/04/2008 2.4.2.4
- Drag and drop on the Source path now appends to whatever is already there. Formatted thread display window better.
- Added check for write permissions and/or low disk space on destination.
Fixed problem with "can't access directory" messages appearing when
canceling a process.
- Make sure the destination directory is there before writing the free space test file to it.
- Added Pause feature. Added disk space thresholds for Low and Minimum
disk space on the destination. It will send an e-mail (if e-mailing is
enabled) when those thresholds are reached. If it hits the minimum
threshold, it will automatically pause move/copy processing.
- Fixed low disk space e-mails from being repeated. Added disk space remaining to summary report.
- Added e-mail settings (file | preferences)
|
Date
|
Version |
Description |
| 12/13/2007 |
2.4.0.1 |
Changed filename from wildcard to Regular Expressions. |
| 12/13/2007 |
2.4.0.0 |
Added ability to select date/time compare options: Newer, Different, Ignore |
| 12/12/2007 |
2.3.0.1 |
Fixed files being left in the source if they already existed in the destination during moves. |
| 12/11/2007 |
2.3.0.0 |
Changed Date/Time compare handling. Files will now only be copied if they are NEWER. |
| 12/9/2007 |
2.2.1.29 |
New logging class - no difference externally, but now the class is available for other projects. |
| 11/28/2007 |
2.2.1.26 |
Update some more error messages. |
| 11/12/2007 |
2.2.1.20 |
More accurate time remaining. Better retry on error handling. Update some error messages. |
| 11/5/2007 |
2.2.1.10 |
Carriage return in source path with only 1 path caused exception while sending e-mail message. |
| 11/1/2007 |
2.2.1.9 |
Added error handling for importing/exporting settings file. |
| 10/29/2007 |
2.2.1.8 |
More efficient sharing of scanning and copying threads. |
| 10/24/2007 |
2.2.1.4 |
Additional completion time estimate based on last throughput measurement. |
| 10/22/2007 |
2.2.1.0 |
Rearranged UI to use screen better. Removed tabs and a couple other items. |
| 10/20/2007 |
2.2.0.0 |
Reworked threading to make it follow the trees from top to bottom. |
| 10/18/2007 |
2.1.0.27 |
Added a "threads running" window - drag the main window wider horizontally to see it. |
| 10/16/2007 |
2.1.0.24 |
Configuration file saving. Command line version. |
| 10/13/2007 |
2.1.0.20 |
More consistent error reporting. |
| 10/10/2007 |
2.1.0.15 |
Current BPS stats (sliding window). |
| 10/8/2007 |
2.1.0.0 |
Fixes to multithreading for deep directories. |
| 10/1/2007 |
2.0.0.0 |
Multithreading. |
| 9/20/2007 |
1.0.4.31 |
More stats. Added Drag and Drop to the paths fields. |
| 2/5/2007 |
1.0.4.0 |
One-way syncing option added (McSync) |
| 2/1/2007 |
1.0.3.0 |
Now handles multiple source paths. |
| 1/30/2007 |
1.0.2.0 |
Wildcard handling. |
| 1/17/2007 |
1.0.1.0 |
Added delete tree functionality on the delete tab. |
| 11/16/2006 |
1.0.0.12 |
Added options to set the number retries and time between them on file/directory operations failure. |
| 11/16/2006 |
1.0.0.10 |
Added cancel button. |
| 11/14/2006 |
1.0.0.7 |
Initial release.
|