Overview
Prerequisites
Before you begin, please ensure you have the following set up:-
The
aoscommand-line interface (CLI): This is the primary tool for interacting with the AO ecosystem. If you don’t have it installed, open your terminal and run:If you’re new toaosor want a refresher, we highly recommend reading the official AO Cookbook guide on connectingaoswith HyperBEAM nodes. -
An Arweave Wallet: You’ll need a wallet (like ArConnect or Wander) to create your
aosprocess. Your process ID is your identity on the network and is required to receive credits.
Key Concepts & Terminology
To use our service effectively, you’ll need to understand these two core concepts:-
HyperBEAM vs. Legacy Network
The AO ecosystem consists of two main environments: the
legacynetand the newer, high-performanceHyperBEAM. Our APUS AI Service is built exclusively onHyperBEAMto leverage its unique capabilities for GPU integration. -
Router Process
The Router Process is the “front door” to the APUS AI service. It’s a single, stable AO process that you will send all your inference requests to. It manages the request queue, validates payments, and dispatches tasks to available GPU workers. You will use this Process ID as the
Targetfor all your interactions.- Router Process ID: TED2PpCVx0KbkQtzEYBo0TRAO-HPJlpCMmUzch9ZL2g
The previous router Bf6JJR2tl2Wr38O2-H6VctqtduxHgKF-NzRB9HhTRzo is still active but experiencing backpressure.
- Router Process ID: TED2PpCVx0KbkQtzEYBo0TRAO-HPJlpCMmUzch9ZL2g
- Credits Credits are the units of payment for using the APUS AI service. Every inference request consumes a fixed amount of credits.
Service Capabilities & Hackathon Limitations
To ensure a smooth and predictable experience for all participants during the Hackathon, our service is operating with the following specifications.- AI Model: Gemma-3-12B (Fixed)
We have fixed the model to
Gemma3-12B. While our infrastructure supports multiple models, frequently switching between them incurs significant performance overhead. Fixing the model ensures that every developer gets fast and consistent response times. - Context Window: 32K Tokens Your prompts and the generated responses share a context window of 32000 tokens. This is the total memory the model has for a single conversation turn. Be mindful of this limit when designing long, conversational agents.
- Session Management (KV Cache)
A “Session” in our service refers to the model’s short-term memory (the KV-Cache) of your ongoing conversation. Here’s what you need to know:
- Persistence is Not Guaranteed: We do not promise to retain your session’s KV-Cache indefinitely.
- System-Wide Limit: The service retains a maximum of 100 active sessions across all users.
- Developer Takeaway: This means your agent can have short, stateful conversations. However, for long-term memory, you must manage the conversational history within your own AO process and pass relevant context in each new prompt.