Ever since we’ve been able to use computers to collect and analyse data, we have always been seeking to improve lives with data-driven insights.
The interception of Blockchain technology and Data is one of the most exciting innovations today.
We gathered a few speakers who working in the space of blockchain technology and data, to share their thoughts:
- Andrea Armanni, Product Strategist at Ocean Protocol, which builds next generation tools to unlock data at a large scale.
- Dinidh O’Brien is the head of PR and marketing of Data Lake, whose team is working to transform medical research by putting anonymised patient consents for the use of their data for research on the blockchain. He has been in blockchain and crypto space since 2012, and has also been a consultant to companies, helping them integrate blockchain and DLT into their business.
- Jackie Tan is Head of Academy at Tribe, a Web3 ecosystem builder, training tech talent to be ready for blockchain & Web3. He also teaches the data science Masters programme at Nanyang Technological University. Before joining Tribe, he co-founded UpLevel, a data science academy as well as a financial insights company.
Read what we chatted about below!👇
- (1) How has Data Science evolved? How does Blockchain come in?
- (2) What Opportunities & Projects are you most excited about?
- (3) How will innovations in AI and machine learning (like and DALL-E and Stable Diffusion) be integrated with Web3/Blockchain?
- (4) What are Key Challenges faced in this space today?
- (5) Any Career Tips for budding Blockchain Data Scientists?
- Related Articles
(1) How has Data Science evolved? How does Blockchain come in?
Data Science is a field of study where you draw insights from large amounts of data. Insights can be used to explain or better, predict things.
The use cases for data science can range from the status of diseases to tomorrow’s stock market prices.
An emerging field in data science is blockchain data analysis. There are a lot of blockchains today. And they offer a treasure trove of data.
There’s an increasing intersection between blockchain and data science, where data scientists can use the data on the chains to do everything that traditional data scientists do.
Andrea, Ocean Protocol
Data science is about making data useful. And blockchain introduces a lot of use cases by offering a high quality data to the data science community.
We can see that from the new business models like Nansen and Dune Analytics that have made blockchain data more readable.
And if a data scientist wants to extract data from the blockchain, they usually use an available dataset, or APIs from the blockchains, or commercial solutions that can be quite costly.
What’s exciting is the growth of Python libraries such as Bitcoin.py or even Ocean.py that has raw and aggregated data. They streamlines the data pipelines in a way that data scientists can extract value from each phase of the value creation loop. Which is really exciting.
(2) What Opportunities & Projects are you most excited about?
Dinidh, Data Lake
What’s exciting for me where data science and blockchains meet, is that it solves one of the unseen issues.
I know that data scientists are working hard to find correlations to results and analysis. But speaking from the medical field, the provenance for quality and accuracy of data is the primary struggle they have right now.
And that’s why blockchain is really useful to offer true data analysis, since it is cryptographically and mathematically provable.
Andrea, Ocean Protocol
Yes, people now are more aware how data is used. It’s only a matter of time before we it evolves into data sovereignty, so I find that any project that works to unleash this data economy to be really exciting.
I think that the monetisation of scientific data to be pretty exciting. I have a background in scientific research, did some research in the past.
There are 2 kinds of results in scientific publishing – successful and not so successful results. In the scientific world, you tend to publish only the good ones. But “bad” results are also important. However, they’re usually hidden because there’s no perceived value.
If you can monetise or be ok with “failure” or not-so-successful results, I think it’s will be a great for the scientific community.
What projects I’m interested in, are the ones that increase blockchain data for predictive purposes.
For example, I’m working on using machine learning to predict scam tokens on decentralised exchanges.
So what it means is we try to predict the chances that a project is trying to do a rug pull.
To do that, we look at patterns and the amounts found in wallet transactions to derive certainty of whether a project is legitimate or just trying to do pump and dump.
Yeah I find your project interesting, because in today’s world, data or the movement of information is still pretty siloed.
For example, information about how money moves from bank to bank is pretty siloed right now. But it’s different today with the blockchain – you can see the movement of data, and I think it’s a really interesting opportunity for data scientists.
Of course, there could be bad players who might target or doxx certain people, but the good players out there, like what Jackie is trying to do to identify rug pulls or even whales manipulating the market, I think it’s really exciting.
I’ve been trading crypto for 12 years, and I could have used your project 12 years ago! I really support projects with tangible, real world value, so even Ocean Protocol what they're doing in solving and sharing large amounts of data, I’m all for it.
For me, I’m excited about projects in the DeSci space – projects that are solving scientific problems through decentralisation and Web3 technologies.
For example, LabDAO is a decentralised citizen scientist project that gives funding in a decentralised way. Good research sometimes doesn’t find the funding. Decentralised funding helps to give consensus on what the world at large would like to fund, is pretty exciting.
The project I’m in, Data Lake, is in the intersection of blockchain and medical data specifically. We are quickly moving towards becoming a leader & trailblazer in DeSci due to how we’re giving sovereignty into the user of your medical data using the blockchain.
So yes, the applied, non-speculative use of crypto and blockchain data – not just meme coins – is exciting for me.
(3) How will innovations in AI and machine learning (like and DALL-E and Stable Diffusion) be integrated with Web3/Blockchain?
AI and machine learning is significantly lowering the cost of creation. It’s making it possible for anyone to create new things at a fraction of the cost and time. For example:
- OpenAI’s GPT-3 is trained off 1 trillion words from Common Crawl (a non-profit that scrapes billions of webpages monthly), books, and Wikipedia. Since the average book has 100K words, GPT-3 has essentially read ten million books.
- GPT-3 can easily generate essays from prompts like: “Write a scary essay about how my dog ate my homework.”
- Art and Digital Media
- Stable Diffusion is a deep learning model that trained off 5 billion image-text pairs (images that have HTML alt-text attributes). Unlike other AI models, Stable Diffusion was open sourced to the public. Since then, there has been a Cambrian explosion of innovation built on top of the model. It has been used for:
- Music and Sound
- OpenAI’s Whisper is a “sound to text” speech recognition AI trained off 680K hours of voice data from the web.
- AudioGen is a “text to sound” AI. You can type “whistling and wind blowing” and the AI will generate the corresponding sounds.
Unlike text and images, music is usually under copyright. That’s why there’s no Stable Diffusion for music yet, but it might be a matter of time. Meanwhile, researchers have focused on sound:
How do you think these innovations will impact Web3?
These models are generative models to help create new things, so I think it can help artists and designers to speed up their process and create NFTs faster.
As for the blockchain, it’s a bit of a contrast since blockchain is used to record past data. I think there’s still a bit of gap between AI’s generative prediction/creation of data, versus blockchain’s technology that is used to store data.
Andrea, Ocean Protocol
I think we will face issues with data and GDPR, as well as computation. There’ll be a need to use a larger machine for running the models.
Yes, the models are huge, but what’s amazing about the deep learning community is how we’re working to make it accessible.
For example, Stable Diffusion initially needed 100GB GPU, 12 graphics card. But over next couple of months, now you need a 12 GB GPU – you can essentially run it on your own laptop.
I think open sourced projects & communities is certainly exciting and I can’t wait to see what we can come up with – perhaps even the intersection of Web3 and AI.
(4) What are Key Challenges faced in this space today?
Besides what I mentioned about the provenance of quality of data, I think that there are key challenges in talent needs for legal and compliance. On the blockchain, you need developers and auditors, and since this space is still nascent, costs are high, and there’s still a lack of expertise.
Another challenge is the convincing stakeholders to get support or even funding. For us in the medical healthcare sector, it moves slower. Research can move quickly when funded well. But hospitals are slower to change. And especially since the bad rep from scam tokens, it takes a lot of convincing. That will change in the coming years, especially when projects apply proper real-world use cases.
Andrea, Ocean Protocol
I agree, talent is really an issue. CryptoJobs is doing a really good job sourcing talents.
And we’re waiting to see how regulations will help convince users that blockchain can be safe, not scam.
I would emphasise on educating traditional folks on the utility of Web3 solutions. The scientific space can be slow to adopt new ideas. But one thing that I’m optimistic about is the fact that you can monetise data. And since money is a good incentive, hopefully this will drive things forward.
I think that in Web3 & blockchain, we are able to create tokens, and I think with the right design & incentives, we can get people to do the right thing to move things forward, like how Data Lake you have your own token as well.
Yes, for us as we talk to traditional stakeholders in the EU, we tokenised our project as a way to incentivise and do cross-border transactions since dealing with fiat is just too complicated.
In some cases, there are legislations that are against rewarding people for their contribution, such as giving blood. Which translates over to medical data as well.
For us, a token is actually the perfect solution to navigate the current legislative landscape.
For sure, when we talk to traditional stakeholders, they’ll ask why a token. It’s new, they’re not used to it.
But once they go through that process and understand it, why and what the technology can do to reward participation and automating things, they do understand and the responses are positive.
In my experience, tokens are not barriers to investment or participation, and we do see positive responses despite an industry that’s known to be less receptive to change. There’s acceptance and excitement once people understand fully.
Legalities are interesting. Fun fact is that we can’t get paid for donating blood, but we can be paid for donating blood plasm, even though it’s a similar process.
Correct, legalities in the medical field is very complicated. For us, we need a really strong legal team and all before we even launch a project. And we’ve to be very flexible and adaptive to the change in regulations as well.
Legalities are there to protect us, but sometimes it gets so bureaucratic that it hinders progress. So yes I think a strong legal team seems to be very important when you’re venturing into the Web3 or blockchain space.
(5) Any Career Tips for budding Blockchain Data Scientists?
Andrea, Ocean Protocol
Find your niche and stay consistent! There’s a lot of noise, so it can be overwhelming.
Find something you’re passionate about and stay focused. Keep building!
Focus fully on 1 thing first. Your fundamentals need to be good.
For example, make sure you have a strong base in blockchain development first, before going into blockchain data. Or in data science before getting into blockchain data.