Login

dueda · Apr 17, 2018, 19:07 pm

(Apr 17, 2018, 03:46 am)Hiroven Wrote: About collision-free algorithm. SHA1 was considered good for a long time and yet, collision was already found.

Interesting. But notice the computing power and time needed; if it's that hard to find a collision, it should be as hard to avoid them.
As I see it, yet a long way to go. I suppose a non-random number sequence (like Fourier) could be used; deterministic patterns "fill in the blanks" for collision.
If CDs and DVDs can use interleabing techniques to compress data, it is possible - But maybe not with hashing alone.

(Apr 17, 2018, 05:29 am)Mr.Masami Wrote:
(Apr 17, 2018, 05:16 am)Kingfish Wrote: Probably a bit more possible over the bittorrent protocol as each file is split into pieces so you would just be recreating the piece.

Yes, could be just a slight modification of some open source BitTorrent client to do this simple yet intensive task.
Would this make sense ? idk this is a theory thread so there's my

@edit: wrote a little asm program that takes any file and generates MD5 hash for every 32 byte piece and stores them all in a binary file, now running on my server and tries to find the first piece.

Any theory is welcome. I'm learning as I go.

This brings my teenage attempt to make the lottery - I knew it would be pure luck, but a friend wanted to try.
Put an Apple II 64KB to run for the weekend to find combinations, sent him a floppy disk and said: "Now you mark all those tickets and pay for it." Tongue

Obviously he didn't have that kind of money, neither was it viable to make such a bet; to cover all numbers always cost more than the prize!

I'm curious to see the results - Just hope I live enough...

Mr.Masami · Last Active: Feb 07, 2019 Threads: 70 Posts: 462 Reputation: 7 Torrents

According to my poor math skills 32 byte block has 2^256 combinations which equal to 115792089237316195423570985008687907853269984665640564039457584007913129639936 possible combinations.
So the hashing algorithm needs to compute 3705346855594118253554271520278013051304639509300498049262642688253220148477952 bytes which equal to 3369993333393829974333376885877453834204643052817571560137951281152 TiB of data.

Piece of cake.

[Image: e2SzWCP.png]

@edit: Tested with 2-byte pieces, reconstruction works 1:1 and fairly fast but of course no point of using pieces < hash, some special/optimized hashing algorithm just for this purpose would be needed.
Later I will try different piece size and maybe write some special hashing function for this purpose, it doesn't need to be sophisticated, actually - the simpler the better.
All we need is a good ratio between hash size and the piece size with fairly good speed.

In the end it's probably a waste of time but I just like to experiment with stuff Big Grin

dueda · Apr 18, 2018, 11:35 am

I salute your healthy curiosity and sporty spirit!

I found this: http://burtleburtle.net/bob/crypto/magnitude.html
Quote:

2^256 = Possible 256-bit (32-byte) keys.
2^268 = Instructions executed by a nanocomputer the size of the sun running for a million years.
Calculated by 2^190 atoms / 2^20 atoms/processor * 10^16 instructions/second * 3600*24*365.25*1,000,000.
---

So in order to do the hash in 1 second, working with 32-byte hashes for a piece of a file, would require less than
3600*24*365.25*N million sun-sized nanocomputers, where N is the instruction count to run the algorithm.
Time would be 1/(2^268-2^256) s, thats too fast for what I wanted. Maybe hundreds, not millions of supernanocomputers.

Current byte nomenclature can't handle it, Vundabytes are 2^33 (11 orders of magnitide).

Mr.Masami · Apr 18, 2018, 12:09 pm

Calculations done, now all we need is 1.21 gigawatts power source Big Grin

dueda · Apr 18, 2018, 14:13 pm

lol guess not even NSA can afford it.

ID10TError · Last Active: Nov 02, 2019 Threads: 18 Posts: 57 Reputation: 0 Torrents

(Apr 18, 2018, 07:03 am)Mr.Masami Wrote: According to my poor math skills 32 byte block has 2^256 combinations which equal to 115792089237316195423570985008687907853269984665640564039457584007913129639936 possible combinations.
So the hashing algorithm needs to compute 3705346855594118253554271520278013051304639509300498049262642688253220148477952 bytes which equal to 3369993333393829974333376885877453834204643052817571560137951281152 TiB of data.

Piece of cake.

@edit: Tested with 2-byte pieces, reconstruction works 1:1 and fairly fast but of course no point of using pieces < hash, some special/optimized hashing algorithm just for this purpose would be needed.
Later I will try different piece size and maybe write some special hashing function for this purpose, it doesn't need to be sophisticated, actually - the simpler the better.
All we need is a good ratio between hash size and the piece size with fairly good speed.

In the end it's probably a waste of time but I just like to experiment with stuff

You guys are my heroes =D

No I BELIEVE this is the FUTURE of file compression, with FPGAs and other specialized cpu types appearing I believe this is the future of the field, Arbitrarily leaving out some data to be calculated later.

I am working on somthing I call Hierarchical CRC which just stores the crcs for specific lengths of data within the source file(s).

The length will be stored in the header so no guessing needed.

i have two lengths of data, a sector(4 bytes long) and a cluster(8 bytes long).

So I store the checksums for the lengths of data.

And you end with around twice the sector checksums as cluster checksums altho you could interleave these, i.e. jump only 4 bytes forward per cluster checksum.

Then when i weave back together the sectors i bulk(*1) reconstructed earlier all I have to do is wander the cluster sector possibilitys(*2) to weed out collisions then we use a variable length checksum, one that grows more complex as data complexity grows.

I believe the most difficult task would be getting the system to weed the majority of collisions so anything else that appeared to match would be obvious error on open.
Collisions are ok as long as they are managed.

What do ya all think?

*1, why not generate the sectors only once(store them on disk for use later) using the list of sector crcs as match tool.
*2, by checksuming the raw data for two sectors you can verify them against the cluster crcs.

dueda · Nov 26, 2018, 09:13 am

We aren't geniuses but so far the task is too complex for current knowledge and technology; it is theoretically possible but takes too much time or resources.
How did your compression model fare in tests?

ID10TError · Last Active: Nov 02, 2019 Threads: 18 Posts: 57 Reputation: 0 Torrents

(Nov 26, 2018, 09:13 am)dueda Wrote: We aren't geniuses but so far the task is too complex for current knowledge and technology; it is theoretically possible but takes too much time or resources.
How did your compression model fare in tests?

It hasn't fared in tests yet, very much little work in progress but the compression factor is pretty arbitrary if you go with the crc(8bit) per sector and crc(8bit) for cluster(non interwoven) its better than 50% compression.

I propose to limit the search length to 256*256*256*256=4294967296 combinations of 4 bytes meaning or 16384 MB, fairly reduced cpu requirements dont ya think or do I have my head up my ass somwhere?

I know! I'll take some meth and see if I can finish the encoder tonight, thats the easy part. Maybe then I can start work on the fun part, the decoder.(Popeye the sailor man plays quietly in the background)

If I do finish the task, it will be for single file only and it will be assumed that you have already used de-duplication based compression already.

Oh and just to totally phone the whole project in i'm gonna use the nice easy language of VB .Net

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

So I still havn't finished the encoder, can't answer a basic question, how much should the final checksum grow? Like 1 byte extra per megabyte?

I suppose i should ask does anyone know what file sizes are the maximum before a hash or checksum algorithm suffers from too many collisions? Like for MD5 and the like?

Login
Username:
Password:	Lost Password?
	Remember me

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Reducing mp4 file sizes	Mach234	2	2,586	Mar 05, 2024, 10:18 am Last Post: Mach234
	My Customised GNU/Linux Distros: Is it worth sharing?	RobertX	16	14,883	Feb 06, 2024, 21:10 pm Last Post: RobertX
	How to play a SSIF file ?	WW3hasstarted	4	4,064	Dec 28, 2023, 11:19 am Last Post: WW3hasstarted
	Modify date for .dat file	zillah	0	9,469	Mar 29, 2022, 05:28 am Last Post: zillah
	Power Director and HOSTS file	sixeyeco	6	17,572	Dec 10, 2021, 01:06 am Last Post: steelfox

Search:poststitles Results:poststhreads