Ocean Protocol Meetup
- Get link
- X
- Other Apps
Originally posted January 9, 2019
I hosted a meetup.com meeting today to talk about Ocean Protocol, other data sources for machine learning, and lead a group discussion of startup business ideas involving curating and selling data. The following is from a handout I created from material on the Ocean Protocol web site and other sources:
Data Trumps Software
Machine learning libraries like TensorFlow, Keras, PyTorch, etc. and people trained to use them have become a commodity. What is not a commodity yet is the availability of high quality application specific data.
Effective machine learning requires quality data
- Ocean Protocol https://oceanprotocol.com - is a ecosystem based on blockchain for sharing data that serves needs for both data producers who want to monetize their data assets and for data consumers who need specific data that is affordable. This ecosystem is still under development but there are portions of the infrastructure (which will all be open source) already available. If you have docker installed you can quickly run their data marketplace demonstration system https://docs.oceanprotocol.com/setup/quickstart/.
- Common Crawl http://commoncrawl.org - is a free source of web crawl data that was previously only available to large search engine companies. There are many open source libraries to access and process crawl data. You can most easily get started by downloading a few WARC data segment files to your laptop. My open source Java and Clojure libraries for processing WARC files are at https://github.com/commoncrawl/example-warc-java
- Amazon Public Dataset Program - is a free service for hosting public datasets. AWS evaluates applications to contribute data quarterly if you have data to share. To access data sources search using the form at https://registry.opendata.awsto find useful datasets and use the S3 bucket URIs (or ARNs) to access. Most data sources have documentation pages and example client libraries and examples.
Overview of Ocean Protocol
Ocean Protocol is a decentralized data exchange protocol that lets people share and monetize data while providing control, auditing, transparency and compliance to both data providers and data consumers. The initial Ocean Protocol digital token sale ended March 2018 and raised $22 million. Ocean Protocol tokens will be available by trading Ethereum Ether and can be used by data consumers to purchase access to data. Data providers will be able to trade tokens back to Ethereum Ether.
Terminology
- Publisher: is a service that provides access to data from data producers. Data producers will often also act as publishers of their own data.
- Consumer: any person or organization who needs access to data. Access is via client libraries or web interfaces.
- Marketplace: a service that lists assets and facilitates access to free datasets and datasets available for purchase.
- Verifier: a software service that checks and validates steps in transactions for selling and buying data. A verifier is paid for this service.
- Service Execution Agreement (SEA): a smart contract used by providers, consumers, and verifiers.
Software Components
- Aquarius: is a service for storing and managing metadata for data assets that uses the off-chain database OceanDB.
- Brizo: used by publishers for managing interactions with market places and data consumers.
- Keeper: a service running a blockchain client and uses Ocean Protocol to process smart contracts.
- Pleuston: an example/demo marketplace that you can run locally with Docker on your laptop.
- Squid Libraries: client libraries to locate and access data (currently Python and JavaScript are supported).
Also of interest: SingularityNET
https://singularitynet.io is a decentralized service that supports creating, sharing, and monetizing AI services and hopes to be the world’s decentralized AI network. SingularityNET was started by my friend Ben Goertzel to create a marketplace for AI service APIs.
- Get link
- X
- Other Apps
Popular posts from this blog
I am moving back to the Google platform, less excited by what Apple is offering
I have been been playing with the Apple Intelligence beta’s in iPadOS and macOS and while I like the direction Apple is heading I am getting more use from Google’s Gemini, both for general analysis of very large input contexts, as well as effective integration my content in Gmail, Google Calendar, and Google Docs. While I find the latest Pixel phone to be compelling, I will stick with Apple hardware since I don’t want to take the time to move my data and general workflow to a Pixel phone. The iPhone is the strongest lock-in that Apple has on me because of the time investment to change. The main reason I am feeling less interested in the Apple ecosystem and platform is that I believe that our present day work flows are intimately wrapped up with the effective use of LLMs, and it is crazy to limit oneself to just one or two vendors. I rely on running local models on Ollama, super fast APIs from Groq (I love Groq for running most of the better open weight models), and other APIs from Mist...
AI update: The new Deepseek-R1 reasoning language model, Bytedance's Trae IDE, and my new book
I spent a few days experimenting with Cursor last week. Bytedance's Trae IDE is very similar and is currently free to use with Claude Sonnet 3.5 and GPT-4o: https://www.trae.ai/home I would like to use Trae with my own API accounts but currently Bytedance is paying for LLM costs. I have been experimenting with the qwen2.5 and qwen2.5-coder models that easily run on my M2Pro 32G Mac. For reasoning I have been going back to using OpenAI O1 and Claude Sonnet, but after my preliminary tests with Deepseek-R1, I feel like I can do most everything now on my personal computer. I am using: ollama run deepseek-r1:32b I recently published my new book “ Ollama in Action: Building Safe, Private AI with LLMs, Function Calling and Agents ” that can be read free online at https://leanpub.com/ollama/read
Getting closer to AGI? Google's NoteBookLM and Replit's AI Coding Agent
Putting "closer to AGI?" in a blog title might border on being clickbait, but I will argue that it is not! I have mostly earned my living in the field of AI since 1982 and I argue that the existence of better AI driven products and the accelerating rate of progress in research, that we are raising the bar on what we consider AGI to be. I have had my mind blown twice in the last week: Today I took the PDF for my book "Practical Artificial Intelligence Programming With Clojure ( you can read it free online here ) and used it to create a notebook in Google's NotebookLM and asked for a generated 8 minute podcast. This experimental app created a podcast with two people discussing my book accurately and showing wonderful knowledge of technology. If you want to listen to the audio track that Google's NotebookLM created, here is a link to the WAV audio file Last week I signed up for a one year plan on Replit.com after trying the web based IDE for Haskell and Python...