Show HN: Use Claude Code to Query 600 GB Indexes over Hacker News, ArXiv, etc.

(exopriors.com)

53 points | by Xyra 3 hours ago

9 comments

barishnamazov 1 hour ago
I like that this relies on generating SQL rather than just being a black-box chat bot. It feels like the right way to use LLMs for research: as a translator from natural language to a rigid query language, rather than as the database itself. Very cool project!
Hopefully your API doesn't get exploited and you are doing timeouts/sandboxing -- it'd be easy to do a massive join on this.
I also have a question mostly stemming from me being not knowledgeable in the area -- have you noticed any semantic bleeding when research is done between your datasets? e.g., "optimization" probably means different things under ArXiv, LessWrong, and HN. Wondering if vector searches account for this given a more specific question.
kburman 7 minutes ago
> a state-of-the-art research tool over Hacker News, arXiv, LessWrong, and dozens
what makes this state of the art?
7777777phil 1 hour ago
Really useful currently working on a autonomous academic research system [1] and thinking about integrating this. Currently using custom prompt + Edison Scientific API. Any plans of making this open source?
[1] https://github.com/giatenica/gia-agentic-short
nineteen999 1 hour ago
That's just not a good use of my Claude plan. If you can make it so a self-hosted Lllama or Qwen 7B can query it, then that's something.
mentalgear 1 hour ago
Nice, but would you consider open-sourcing it? I (and I assume others) are not keen on sharing my API keys with a 3rd party.
gtsnexp 52 minutes ago
Is the appeal of this tool its ability to identify semantic similarity?
bugglebeetle 1 hour ago
Seems very cool, but IMO you’d be better off doing an open source version and then hosted SAAS.
octoberfranklin 47 minutes ago
"Claude Code and Codex are essentially AGI at this point"
Okaaaaaaay....
[-]
- Hamuko 25 minutes ago
  I have noticed that Claude users seem to be about as intelligent as Claude itself, and wouldn't be able to surpass its output.