Building Search: NASA Library 🚀
How to build an AI-powered search in a day using Curiosity
Anyone that worked on a large enterprise knows this problem: You have lots of documents spread across your company, but it is just very difficult if not impossible to find and navigate any of this knowledge.
Today we’ll take a look on how you can use Curiosity to create your own enterprise search from scratch, in minutes!
To get started, you’ll need:
Curiosity (running on your machine, server or cloud)
Lots and lots of files (we’ll use a nice public dataset from the NASA library)
If you’re running Windows, the easiest way to test-drive Curiosity is to download the installer on our website.
Alternatively, you can also run it using Docker on any operating system:
mkdir -p ~/curiosity/storage docker run \ -p 8080:8080 \ -v ~/curiosity/storage/:/data/ \ -e MSK_GRAPH_STORAGE=/data/ \ curiosityai/curiosity
In either cases, once you have it running, you should be able to open your browser to http://localhost:8080/ an see the login screen:
The default account and password for an empty system is admin / admin** — don’t forget to change it during the initial setup!**
After your first login, you should see the initial setup screen. This short assistant will guide you through the minimum configuration to get your system up and running. On the Make it your own step, you can customize the name and logo — we’re calling our system Space Library, and we picked an icon from FlatIcon to match! Just remember the icon must be in SVG format.
Continue to the Languages step, click on English and then click on Continue.
You’ll need one more thing before jumping to the data ingestion: an API token.
For that, open the Data Sources hub, click on External Connectors, Create API Token, enter a name for the token and copy it somewhere for now. We’ll use this token in the next step!
Let’s get the data going 🚀
For the data ingestion, we’ve prepared a dump of the NASA library ready for you to use. It includes both the files and the metadata available on their site (with information on Authors, Organizations and Categories). The code for the data connector is available on our GitHub page, and all you need to do is to clone the repository and run the project inside it. Ahh, you’ll also need to have the latest NET SDK installed on your computer 😀.
Let’s first get the data in the system, and then we’ll explain how we’re modeling the file and metadata on the knowledge graph, and how to configure the front-end to explore this data. Run the following on your command line, replacing MY_API_TOKEN with the token you generated previously.
git clone https://github.com/curiosity-ai/space-library/ cd space-library dotnet run http://localhost:8080/ MY_API_TOKEN
This might take a while, depending on your internet connection, as the code first downloads the 45GB dataset from our servers, and then uploads it to your local Curiosity instance.
Meanwhile, we can take a look on the source-code to see what it is doing. The core of this data connector does three things: download the data, create the data schema on the Curiosity system, and upload the data with the metadata into the knowledge graph.
The CreateSchemaAsync method is responsible to create your graph schema, as the Curiosity graph is a strongly typed property graph.
Each node schema has to define a Key field that uniquely identifies the data, and can define other properties as necessary:
For defining the graph edges, we like to create a helper class like this:
We also use a reflection-based method to get all edge names from the class — check it in the original code here — and use it to create all edge schemas on the graph.