Storing files on Hive Blockchain: Part 1 - Planning

in LeoFinance3 months ago


For last couple of weeks I was trying to figure out if it was possible to store files like images, pdf, audio, video, etc on Hive blockchain. Hive is a decentralized database. So, in theory it should be possible. Now the question is, if it would be efficient and useful to do so.

All transactions and data in Hive are stored in blocks. These blocks are limited in size. Each block can have no more than 65,536 bytes. Many large files wouldn't fit into one block. In order to store files larger than block size limit, files would need to be split into pieces while storing. Later when we need to retrieve the files we should be able to put the pieces back together to its original form.

As I was experimenting with various aspects of this project, I ended up writing many small scripts that would perform certain functions. I had to stop at the point where I had to put all the various functions together to achieve the desired outcome. In other words, my plan lacked proper planning. That's why I writing this post to organize my project a little better for future reference and hopefully to get some feedback on how to achieve the goal more efficiently.

Project: Storing Files on Hive blockchain.

Goal of the project:To store image, document, audio, and video files on Hive blockchain. Afterwards to retrieve these files from the blockchain. Then add an ability to share with others.

  • Step 1: Will use Streamlit python module for GUI purposes. Streamlit has built-in file upload functions that easily provide file sizes, types, and names.
  • Step 2: Uploaded files will need to be encrypted. The encrypted version of the file will be split into pieces of about 60,000 bytes. These pieces will be stored on Hive blockchain using custom_json operation. After each storing process, the exact or close to exact location of the pieces will need to be recorded separately. Once all pieces are stored on the blockchain, the list of the locations for each piece will be stored separately with a wallet transfer in any encrypted memo.
  • Step 3: Separate account name will store wallet transfers with encrypted memo that provides more details of where file pieces are stored and in which order.
  • Step 4: Now that we can store files, we need an ability to retrieve the files and read/view them and/or share with others. Separate Streamlit app will be used for this purpose. First to check all the available files stored by a user, we check the dedicated account that keeps track of wallet transfers with encrypted memo in a certain format. Clicking a chosen file will get the locations data from the memo and retrieve the file pieces, put the together and present in its original format.
  • Step 5: First four steps only allow storing and retrieving files by one user. Next step is to make it possible for this user to share the files with other users. One approach can be user1 will send the decryption code to user2, which will give user2 full access to the files. The problem with this is, if user1 changes mind and would like revoke ability by user2 to view the files.
  • Step 6: Only share temporary decryption code with users the files are shared with. After some time the temporary code can expire or can be changed manually by the user sharing the files. I am not sure how it can be achieved yet. One of the ideas is to send partial decryption code to the user files being shared with, and the other half is sent to the App.

I haven't thought through all pieces yet. Not even sure if any of the above steps make sense. I will give a try one step at a time and see what the outcome can be. At the very least, it will be a learning experience for me. If you have any ideas to achieve the goals described above in a more efficient manner, please let me know in the comments.

Posted Using LeoFinance Beta


Please for the love of god no. XD

As a witness who runs the one of the servers that HIVE lives on accross the world the absolute last thing I want people doing is using the blockchain as megaupload or zippyshare.

While it's true that the cost of memory is going down the way that the hived executable works in a lot of cases begins to get jumbled up and bottlenecks of some of its core architecure make things take longer than they should.

You start allowing super easy data uploads to HIVE then all of a sudden chain size bloat becomes a real issue, the operating costs of the entire network goes up due to the need to store this new bloat on the chain and ultimately it ends up not being so much of an asset but rather a massive expensive inconvenience to the blockchain itself.

With stuff like IPFS and whatnot already blazing trails in the whole decentralization of data and alternative hosting means I'm not sure why making something like this would honestly be the best use of your skills and resources.

Anyways, just my 2 cents. Glad to see you thinking about projects for HIVE but personally I'd dissuade you from this whole idea.. It's been visited in the past by a few folks and all times ends up running into the issue of chain bloat.

Edit: @sanjeevm has a pretty good idea on how this could be made feasible!

Thanks, mate, glad I waited to skim through the comments before hitting the keyboard to write basically the same thing.

What I should add, though, for @geekgirl, is that the custom_json operation should be used only in cases where it acts like some sort of cursor / key / hash / layer 2 for something that needs governance and immutability (hive-engine?). It's a waste to use the governance and immutability of Hive, which is a fast, reliable, cheap blockchain, for something relatively trivial, like storing data. I understand the need, and it's a problem that needs a solution, I just think there are better solution than Hive for that.

It looked to me like using a racing horse to carry logs from the wood. Just because they are both horses, it doesn't mean they're both very good at the same thing.

Other than that, keep building!

Posted Using LeoFinance Beta

This project is so wrong on many levels.. Why on earth you want to do that. Lolz. It's better to use custom_json as a reference to stored data somewhere else.
This get me thinking though.. what if we inject viruses to blockchain? It can't be deleted but only third party software need to actively block the code

WHATS WRONG WITH IPFS has been screaming in the back of my head since I read this.

Posted Using LeoFinance Beta

Was my knee jerk reaction as well once I got enough info on the project goal.

@geekgirl would IPFS be a solution?
Maybe ipfs-desktop or IPFS Cluster can help.

Posted Using LeoFinance Beta

indeed, just seeing how easily you linked those is enough reason to recommend IPFS over competition, it's well documented and commonly understood

I am not really a developer. I am just learning and do things to learn and practice. I wouldn't be able to do anything sophisticated enough that would cause massive use and bottleneck issues for the blockchain.

However, since I have read someone say that Hive is not capable of handling 500 million of tweets on daily basis, I was curious to see what Hive can handle and its scaling limits. If you are worried that possibility of storing files may cause issues for Hive, that makes me think you agree that Hive is not capable of scaling and accommodating hundreds of millions of users in the future.

I don't know how IPFS works. I just experiment with things from time to time. And I find Hive to be the place to learn and experiment. For me personally it is nothing more than learning playground. Thank you for sharing the concerns and potential issues.

Video is still much heavier than million of tweets.
Please do not open the door of storing video and pics on Hive, as it could quite likely damage that blockchain.

Posted Using LeoFinance Beta

Video probably won't be possible, it would take very long time. I can see a benefits of original art or photos being stored though. Anyway, this is just an experiment.

Compared to text, also photos are enormous.
I totally see your idea, and I remember being disappointed myself when finding out that only text is stored on the blockchain. And even there, scaling has been an issue.
But this simply is the limitation, at least nowadays.
I know this is an experiment, but please be aware of the risk this idea poses. If it becomes usable, you might crash Hive.

I love this project idea!

These pieces will be stored on Hive blockchain using custom_json operation.

You may have to use actual comment operations if there's a byte size limit of 8192 for custom_json which I think there is.


Yes, you are right there is 8192 limit for custom_json. I just tested. I didn't know that. Thank you.
Do you know why there is such limit for custom_json?

One of the devs probably added it to prevent bloat.

I'm not sure if this limitation only exists on client-side libraries (python/javascript) or if it's the server that's saying no.

The actual data is still stored on #IPFS.

The interface does authenticate with HIVE but only for broadcasting a file descriptor to the blockchain. That extra step allows people to find and clone your work with ease.

I thought you knew enough about Hive to know that this is not such a great idea.

Oh no, a useless photographer with zero skills is trying to bring me down : (

If you perceive one sentence as trying to bring you down, well, I will not finish the secon

This is a fantastic idea and I've personally been waiting for something like this to be built on Hive for some time! Great initiative.

Thank you.

I recall somebody testing the limits of resource credits, which involved posting one extremely long post with random text. What you are attempting is similar to how Usenet distributed photos and videos back in the day. There were apps that could download and piece together the posts to render the content.

You do touch on a major point that we have way more resource credits than we use in our day to day interaction. However, it is perhaps this underutilization that makes Hive as fast as it is.

Posted Using LeoFinance Beta

Yes, you are right there is plenty of resource credits for even small accounts. I was testing custom_json operation with Beem on an alt account with 5 hp, and wasn't running out or resource credits for a few operations. I will find out more as I test with larger texts/files.

IPFS is a good fit for this - you can store the files on it and then refer from hive. @threespeak team definitely can provide some insight because they are building on it. And I am sure, I have seen, some other people here building it. Even a project that was storing in IPFS and interacting with hive.

IPFS has some stability issue? and requires active pinning of files? Why aren't we using storj or sia? just wondering..

Thank you. I need to learn more how IPFS works. I don't know much about it.

I've been curious if anyone would do this very cool.

Thank you. I am not sure if I can actually do it. But I will try.

Please do I think this is an extremely important tool if it is possible. You can do it!

Posted Using LeoFinance Beta

I like this idea a lot! You have my vote

Posted Using LeoFinance Beta

Thank you.

I am not really good at coding. But doesn't storing images is already been enable in hive blockchain, eventough it is a small size one?
But it would be very nice if we can do that, to store such a big images or video on hive chain.

Posted Using LeoFinance Beta

No, images are stored on separate servers off chain. Images in posts are not stored in the blockchain, only links to those images are stored in the blockchain.

is that mean, when those servers are gone, that what will still be stored in the chain are just the link and all the text?

Yes you are right .

I've dabbled with encoding rss feeds into account metadata. While what you're talking about is possible, key expiry is going to be very hard to organize and in general this is just a really bad idea for the health of Hive.

The only thing that would happen if this became easy to do is that Resource Credits would have to be drastically devalued and smaller accounts would be restricted.

It sounds like a fun intellectual effort (I'm pretty sure it's been done before) but incentives for secure off chain storage as @threespeak are working on is a much better long term plan.

Posted Using LeoFinance Beta

It is just experimental. I just want to see what is possible and what are the limitation of Hive. I would assume resource credits would serve as protection for Hive itself, as it probably would require a lot of resource credits to broadcast operations to store data based on size. However, for retrieving data though resource credits don't play any role.

If there were a fee in HIVE to use the service, it might limit abuse and help fund the infrastructure increases needed to host more, larger blocks. It would also add a new utility to the token.

NO. The blockchain isn't meant to store files. This can open a pandora's box to all sorts of unwanted files, and increase blockchain bloat and potentially slow down the transactions for the other users, if the blocks are being filled with junk by attackers.

Won't resource credits put limit for any kind of misuse of broadcasting data to the chain? I would assume storing massive amount data would require massive amount of credit resources which would make it too expensive to do so.
What are we going to do when we have tens of millions of people using Hive? Will that slow down transactions and make it impossible for Hive to scale?

Goal of the project:To store image, document, audio, and video files on Hive blockchain.

Isn't that what @3speak is aiming to do?

Probably. 3Speak is working on a product that everybody will use. I don't think 3Speak will be storing them on Hive blockchain itself though. I am just experimenting to see what's possible with Hive.

Censorship-resistant immortal uploads. Great idea!

Thank you.

A very bad idea. Please don't!

The Hive blockchain is NOT designed to store files nor it is designed to manage shares, permissions, quotas and so on. That's the purpose of File Systems.
Add to this that any user wanting to store data will face issues with RC usage.

Better to move to something else more useful.

Like you said any such usage would come with the cost of RC, which would limit the amount of data anybody could store based on HP/RC. I do understand what you are saying. While Hive is not designed for such use, I am still curious of its scaling capacity.

It seems like storing large size files would be inefficient anyway.

It's an interesting idea, but as I understand the tech it would bloat things beyond good and evil. Since somebody is gonna have to pay for it... If this is purely an exercise for you, you might want to look into Base64-encoding. Good luck, and have fun.

Yes it is nothing more than a personal project to see the capabilities of the Hive blockchain. Thank you.

I would say no to the blockchain, but yes to integration with IPFS. Storing files on the blockchain like that, as others have said would be extremely taxing on the witnesses.

What made you decide between Streamlit vs IPFS file sharing?

Posted Using LeoFinance Beta

Streamlit is just for quick GUI purposes, nothing more. I don't know much about IPFS. I need to learn more about it.

it is way more commonly understood by the general public, I would highly suggest looking into using IPFS from peer 1 all the way through to peer 2.

@klye would adding an IPFS module to your witness node be much of a headache?

What is wrong with using IFPS? Can it be done this way?

FWIW @geekgirl I tried this in 2016 (Edit : it was 2014) or so with Dogecoin and they ripped me a new one for even suggesting it, even with magnet links. I linked you some code in Discord that's on github that is standalone if you're interested.

Posted Using LeoFinance Beta

Your post is reblogged and upvoted by me. It is a good post. Thank you @geekgirl