File Upload - Original File MD5Sum Changes

mjohnsonperlmjohnsonperl Registered Users Posts: 41 Big grins
In testing an app I'm building I came across a behavior that seems odd, and I'm curious why it's happening, and also how to handle the difference in my code.

I have my program get all the stats from the file on disk _MG_6644.JPG then build my HTTP PUT request to upload the file to SmugMug. I get a good response back and an ImageID, so then I have my query do a check on the ImageID and retreive the MD5Sum, and Size and compare it against the local file that was just uploaded. Turns out that on this particular file it ends up being different. The image gets uploaded fine and it looks the same, so it doesn't APPEAR to be altered, but the MD5Sum and Size tend to indicate otherwise.

The MD5Sums and Size are identical and unaltered on several other files I've tested.

I tested this thoroughly to isolate that it is a problem with this particular file and not some typo in my code. The file I was uploading was an image that was altered by PhotoShop, and I beleive only PhotoShop, so I pulled the original file to test it, and the original shot from my camera is fine. The fact that it was edited by PhotoShop though doesn't explain the problem. I can imagine that maybe the image could have some sort of file corruption, but it opens just fine in every application I open it in. This still doesn't explain though why it gets altered when uploaded to SmugMug.

I also tested this against several other tools to upload images to SmugMug (Simple, Drag & Drop, Olde faithful, Windows Smugmug Uploader), and all also had the same affect of the file getting altered somehow.

Here are the results of my file upload:
Uploading (File Size: 4042638) (Image ID: 212263417)
HTTP RESPONSE:
<?xml version='1.0' encoding="utf-8" ?>
<rsp stat="ok">
<method>smugmug.images.upload</method>
<ImageID>212263417</ImageID>
</rsp>

filename: _MG_6644.JPG
non-binary size: 4042595
-s size: 4042638
md5: 5b6e9656da0ac71aa5abf599277e2017
state->size: 4042638

smug filename: _MG_6644.JPG
smug size: 3937814
smug md5: 81f5dbba917c1cea764ca2feb07e3c25



I have the script I used to verify this, and the actual file, so if you want to see the results duplicated, or find some situation where it doesn't happen, pleast dig in, just don't burry yourself.
http://digitalmediashelf.com/temp/upload_difference.zip




Any answers or direction would be gladdly appreciated. My primary concern is with the sync program I'm building and the fact that the original image gets altered, and if it can't be prevented, then I'm going to need to determine what to do if the uploaded image ends up being different then the file on the local machine.

Comments

  • devbobodevbobo Registered Users, Retired Mod Posts: 4,339 SmugMug Employee
    edited October 24, 2007
    I'm having a look now to see if I can come up with something.

    As a side note, have you considered using the JSON or XML::Simple perl modules ? Both provide a way to convert the SmugMug responses into a hash, which makes parsing so much nicer than this...

    $sm_image_info_root->findvalue('/rsp/Image/attribute::FileName')

    Cheers,

    David
    David Parry
    SmugMug API Developer
    My Photos
  • GarethLewinGarethLewin Registered Users Posts: 95 Big grins
    edited October 24, 2007
    If you download the file is it the same as the local one? IOW is it the MD5 calc on the server that is wrong, or is the file actually getting modified?
    In testing an app I'm building I came across a behavior that seems odd, and I'm curious why it's happening, and also how to handle the difference in my code.

    I have my program get all the stats from the file on disk _MG_6644.JPG then build my HTTP PUT request to upload the file to SmugMug. I get a good response back and an ImageID, so then I have my query do a check on the ImageID and retreive the MD5Sum, and Size and compare it against the local file that was just uploaded. Turns out that on this particular file it ends up being different. The image gets uploaded fine and it looks the same, so it doesn't APPEAR to be altered, but the MD5Sum and Size tend to indicate otherwise.

    The MD5Sums and Size are identical and unaltered on several other files I've tested.

    I tested this thoroughly to isolate that it is a problem with this particular file and not some typo in my code. The file I was uploading was an image that was altered by PhotoShop, and I beleive only PhotoShop, so I pulled the original file to test it, and the original shot from my camera is fine. The fact that it was edited by PhotoShop though doesn't explain the problem. I can imagine that maybe the image could have some sort of file corruption, but it opens just fine in every application I open it in. This still doesn't explain though why it gets altered when uploaded to SmugMug.

    I also tested this against several other tools to upload images to SmugMug (Simple, Drag & Drop, Olde faithful, Windows Smugmug Uploader), and all also had the same affect of the file getting altered somehow.

    Here are the results of my file upload:
    Uploading (File Size: 4042638) (Image ID: 212263417)
    HTTP RESPONSE:
    <?xml version='1.0' encoding="utf-8" ?>
    <rsp stat="ok">
    <method>smugmug.images.upload</method>
    <ImageID>212263417</ImageID>
    </rsp>

    filename: _MG_6644.JPG
    non-binary size: 4042595
    -s size: 4042638
    md5: 5b6e9656da0ac71aa5abf599277e2017
    state->size: 4042638

    smug filename: _MG_6644.JPG
    smug size: 3937814
    smug md5: 81f5dbba917c1cea764ca2feb07e3c25



    I have the script I used to verify this, and the actual file, so if you want to see the results duplicated, or find some situation where it doesn't happen, pleast dig in, just don't burry yourself.
    http://digitalmediashelf.com/temp/upload_difference.zip




    Any answers or direction would be gladdly appreciated. My primary concern is with the sync program I'm building and the fact that the original image gets altered, and if it can't be prevented, then I'm going to need to determine what to do if the uploaded image ends up being different then the file on the local machine.
  • mjohnsonperlmjohnsonperl Registered Users Posts: 41 Big grins
    edited October 24, 2007
    devbobo wrote:
    As side note, have you considered using the JSON or XML::Simple perl modules?

    I glanced at JSON a bit, and it looks interesting, but I guess I just didn't find anything to get me quickly started with it, and REST seemed like a very natural aproach using XML. I also wasn't sure how well established JSON was with Perl and what documentation or modules I would be able to find when working with it.

    I am also storing a local catalog of images I'm synching in my program, and for storing the local settings and cataloged data I decided to use XML. I figure iTunes uses it to store my entire .mp3 library, I should be able to use it for this. Plus because of this, I'm dealing with a single module that lets me parse the REST responses I'm getting from SmugMug and read, write, and managing the local data in my catalog.

    As far as XML vs. some other method of storing the local data, I just thought XML was a cool method, and wanted to get more familiar with using XML. I also figured if this was the approach I was going to take I'd try and find the BEST solution I could for this method.

    I was reading some discussions about high memory usage and slower performance of other XML parsing methods, and that's why I chose XML::LibXML, plus it has methods for writing the XML back to a file also.

    I came accross this article on PerlMonks by Randal Schwartz:
    http://www.perlmonks.org/?node_id=287656
  • mjohnsonperlmjohnsonperl Registered Users Posts: 41 Big grins
    edited October 24, 2007
    If you download the file is it the same as the local one? IOW is it the MD5 calc on the server that is wrong, or is the file actually getting modified?

    If I download the "original" file that's on SmugMug it has both a different file size an MD5Sum then the image that was originally uploaded. The file is actually modified in some way shape or form after it gets uploaded to SmugMug's servers.
  • devbobodevbobo Registered Users, Retired Mod Posts: 4,339 SmugMug Employee
    edited October 24, 2007
    If you download the file is it the same as the local one? IOW is it the MD5 calc on the server that is wrong, or is the file actually getting modified?

    The MD5 calculation is correct, as if the MD5Sum is provided in the upload request, the image is still processed. If the MD5 sum calculated by the server was different to the MD5 sum provided in the upload request, the upload would fail as the MD5s didn't match.

    I have verified this fact using some upload tools.
    David Parry
    SmugMug API Developer
    My Photos
  • GarethLewinGarethLewin Registered Users Posts: 95 Big grins
    edited October 24, 2007
    If I download the "original" file that's on SmugMug it has both a different file size an MD5Sum then the image that was originally uploaded. The file is actually modified in some way shape or form after it gets uploaded to SmugMug's servers.

    That is very unexpected, devbobo do you guys do ANY parsing of the file? Maybe trimming off unused EXIF data? Or possibly changing the embedded thumbnail?

    Perhaps the OP can give this image to devbobo to take a look at?
  • mjohnsonperlmjohnsonperl Registered Users Posts: 41 Big grins
    edited October 24, 2007
    That is very unexpected, devbobo do you guys do ANY parsing of the file? Maybe trimming off unused EXIF data? Or possibly changing the embedded thumbnail?

    Perhaps the OP can give this image to devbobo to take a look at?

    I uploaded probably 100 images last night and none of them had any MD5Sum differences, meaning that nothing changed to the original file that I uploaded. This only appears to be happening to certain images, and I'm not sure yet what the pattern is for what causes SmugMug's servers to alter the original file after it's uploaded.

    I put together a nice little package in a .zip file that I linked to on my first post. The package includes the Perl script and two image files that can be used to reproduce the results. One of the images is the one I discovered this problem on, and the other is one that uploads just fine.
  • devbobodevbobo Registered Users, Retired Mod Posts: 4,339 SmugMug Employee
    edited October 24, 2007
    ok the reason that the MD5sum is different for that file is that it is AdobeRGB not sRGB, we automatically convert jpeg images to sRGB if they aren't already.

    thumb.gif
    David Parry
    SmugMug API Developer
    My Photos
  • mjohnsonperlmjohnsonperl Registered Users Posts: 41 Big grins
    edited October 24, 2007
    devbobo wrote:
    ok the reason that the MD5sum is different for that file is that it is AdobeRGB not sRGB, we automatically convert jpeg images to sRGB if they aren't already.

    thumb.gif

    Ok, I figured it was something like that. Makes sense now, and it gives me something to look for, so if I find an image that's not in sRGB then to expect it to be converted when uploaded.thumb.gif

    Thanks for getting an answer on this one.
Sign In or Register to comment.