FAQ

What is Thorian AI and who do you work with?

Thorian is a data platform that empowers IT/data teams to be 10x more efficient with the data pulls, visualizations, and reports that make organizations run slowly. We use proprietary data processing methods to deploy fine-tuned AI agents across an organization’s data stores, significantly reducing the time it takes for “data people” like us to provision the custom information that they need. Data leaders access a single operational hub that becomes the shared system for collecting information, coordinating work, and sharing data with permissions across the organization. It’s not “one more tool.” It’s an administrative backbone that reduces structural risk and increases secure speed of output.

Users experience two specific improvements: 1) we make it easier for them to find and pull precise data across siloed and multi-modal data stores; and 2) we facilitate data automations, moving organizations towards a task-focused approach where stakeholders make requests for completed data tasks, which are then completed by managed AI agents. This reduces support hours, smooths the process of communicating data concepts with less technical employees, improves security/auditability, and strengthens final data products without hiring of additional technical staff.

As a product-led company, we deploy this technology wherever a mission-aligned organization is looking to move on from siloed data infrastructure. Our primary areas of focus are political organizations, state/local government agencies, and small businesses such as manufacturing, construction companies, and certain financial firms.

If you’re a product-led company, how does your work on the Epstein Files fit into your roadmap?

We sometimes take on custom data projects when they meet a few key conditions, namely 1) we’re solving an interesting data challenge that moves our company forward; 2) the resulting product can serve as a showcase of our data capabilities; and 3) the project is aligned with our core AI principles and values.

The Epstein Files partnership with Courier is a perfect example of a project that meets all three of these conditions. From a data showcase perspective, the Epstein Files are a public-facing example of the kind of challenges we see organizations facing on a daily basis: they are comprised of a high volume of data that is too large to wrangle with other off-the-shelf-tools; they have very high complexity when it comes to semantic taxonomy and diverse file-types; and their value to users is enhanced by allowing people to draw connections between different parts of the dataset.

From a mission-driven perspective, Thorian is dedicated to making sure the power of AI is deployed in the public interest. We believe that ensuring the Justice Department meets its responsibility to release the files is in the public interest.

How is your Epstein Files Database different from other attempts to make sense of the Files?

The challenge with the Epstein files is not just volume - it's data complexity, inconsistent and shifting redactions, messy and inconsistent file formats, and the need to protect victims wherever possible. Our work is different because we processed, tagged, and provided easily searchable titles/descriptions for every file and every entity in the entire cache.
Most other applications have fully processed a small portion of the files that directly pertain to key people of interest. We have a complete list of all entities (people, organizations, etc) with their own counts and descriptions. For this application, Courier's readers and supporters (as well as other journalists) can dig more deeply into the timelines and connections that make the files interesting under the surface.

To the best of our knowledge, our application is the world’s first complete and publicly searchable Epstein Files database.

Have you put together any other big public datasets like this?

In 2025, we mapped the entirety of the Texas Legislative session (bills, bill analyses, fiscal notes, hearings, etc) and made that information searchable for hundreds of staffers and observers. That dataset was an excellent challenge because legislative data is, by nature, constantly changing. This forced us to build constantly-updating processes that assessed changes to the data and normalized/deduplicated new data in context.

Are you primarily focused on parsing data that already exists in a contained space, like organizational data or the Epstein Files, or do you work with data acquisition tasks as well?

We recognize that sometimes an organization’s data is incomplete, and key insights can be gained by bringing in additional sources and combining them smoothly with existing data. Therefore, we also have sophisticated data acquisition methods that we can deploy for this purpose. Some examples of this work include grabbing government data from the Census Bureau and state education agencies for a child education/welfare nonprofit; extracting company profile data from internet sources and the US patent office; and grabbing targeted information from TikTok for a voter information project.

What makes your technology unique within the market?

What sets Thorian AI apart is our approach to data ingestion and processing. Once ingested, data is modeled and then processed using our proprietary pipelines, preparing it for complex queries with precise, context-aware answers. Because we constrain and ground our AI agents within an organization’s specific context, our automations are both very efficient and very accurate in a way that is impossible with off-the-shelf AI deployments. We maintain LLM agnosticism, allowing us to integrate any model—whether self-hosted or cloud-based. This allows us to adjust individual instances of the technology based on users’ data and privacy needs.

I’m worried about data security. Are you training any AI models on my data? How do you make sure my data stays protected?

First of all, your data is yours, period. We don’t use your data to inform or improve any other projects, nor do we train any models with your data for use with other customers.

Our standard security provisions often exceed those required by major certifications, including our ability to provide single tenant, on-prem, and HIPAA/FERPA-compliant solutions. In a typical engagement: 1) Client data is never accessible by the public, under any circumstances; 2) Data is only accessible via secure, authenticated channels; 3) Data is stored separately and is never co-mingled with other clients’ data; 4) Access is granted per individual user, following least-privilege / need-to-know principles; 5) Access and key actions are logged for monitoring and auditability; 6) Data is encrypted in transit and at rest and 7) Organization-wide identity controls consistently govern access.

I have a custom data project. How do I know if it’s a fit for Thorian?

We’re happy to have a conversation about other custom data projects, especially those that sit in areas of public interest. If you have an idea, contact us through the website and describe what you’re looking for. If it’s a fit, we’ll follow up!