Part 3 – Exploring The CouchBase Go SDK

I’ve settled on a pattern when I discover tech I want to know more about. I go through the example apps of course, and if I’m still interested I build a “playground” app where I run through different scenarios that I think the tech will help me.

Why CouchBase?

I definitely want a document database running in the lab because there are a lot of different schema management options that I want to experiment with. I work with MongoDB already so that was my first inclination. I looked around however and stumbled across Couchbase. I was intrigued by the fact that there was some in-memory functionality support.

Would that negate the need for me to rely on Redis for in memory services? It’s a question I’d like to answer. After confirming there was a free, fully functional local install and that there was a Go SDK, I decided to go for it.

What’s my Goal for the Playground?

Given that I’m starting from scratch with Couchbase I had pretty simple goals.

  • Establish a clean pattern for Create, Read, Update and Delete operations using the Go SDK.
  • Come up with a simple enough data model to work with in my spare time but gives me enough complexity to have nontrivial use cases.
  • Use the Go SDK enough to know what my code organization requirements might be.

If I’m successful, I’ll have a good idea of where to start with an entity package and a persistence service to deploy into our cluster.

My Preliminary Steps

There was a Go with Gin example application that went through some basic CRUD examples here https://developer.couchbase.com/tutorial-quickstart-golang-gin-gonic

I went through that, which was enough to give me confidence that I wouldn’t be wasting time in learning more. I did come out of that quick start application with some pretty big questions. Those questions gave me direction in exploring with my own quick start app.

Question 1 – Oh cool…SQL, do I have to?

The fine folks at Couchbase really tout the SQL++ (formerly N1QL) query engine they’ve built over the top of their document database. I’m sure that’s very comforting to someone landing in Couchbase as their first non-relational database. I like the idea from an adhoc query notion. I can see myself writing SQL queries from Datagrip or the Couchbase Dashboard web application. My fear is being forced to use SQL queries as string literals in my Go code. I am not a fan of sqlString := "select * from whatever" all up in my Go. And that is exactly what I found in the quick start application, so I will definitely be exploring what my options are in my quick start app.

Question 2 – What are my options when I need relationships between entities?

Just because you’ve left relational databases behind, doesn’t mean your data entities won’t have relationships. The design process with document databases has you continually asking “do I embed?”, “do I create a mapping collection?” or “do I add an array of ids into my document?”. I don’t suspect Couchbase will be any different in that respect. I think we’ll want to favor embedding as long as embedding doesn’t result in an unbounded array. I just want to have a better grip on the implementation details with Couchbase.

Question 3 – How does the Go SDK look compared to other supported languages?

It’s a question I still ask simply because it became a reflex in my early days as a Go developer. Fortunately Go adoption has reached the point that it’s generally not ignored. I still like to compare language support however. I want to know that they treat Go equally with other popular languages.

Question 4 – How does this thing use resources?

This question will probably be answered outside of this exercise. I want to get a service deployment completed so that I can push a collection or two into the billion document range. First of all, to see if I can get it there, and secondly to see what that looks like from a hosting standpoint.

My Playground Approach

Based on the questions I have, I came up with the following minimal data model to move forward with. I wrote a simple, throw away Go application where I worked through different solutions based on what I could glean from the CouchBase Go SDK documentation. The data model I opted for is shown below.

Yeah so I don’t know what this application is going to actually do, but every application I’ve ever worked on has some or all of this. People that we’re tracking, organizations they belong to, resources they all have and events that happen that effect their resources. That pretty much sums it up, right? Of course in my pure aerospace days there was a lot more math shenanigans but outside of that I’ve dealt with something like the above for decades now. So instead of doing a TODO app let’s to don't and do this.

Question 1 – Do I have to use the SQL++ thing?

I was able to get away without SQL++ when I did a hard primary key search like the following:

func GetPerson(bucket *gocb.Bucket, email string) (*Person, error) {  
    collection := bucket.Scope(scopeName).Collection(personCollectionName)  

    id := HashKey(email)  
    result, err := collection.Get(id, &gocb.GetOptions{})  
    if err != nil {  
       return nil, err  
    }  
    var person Person  
    err = result.Content(&person)  
    if err != nil {  
       return nil, err  
    }  
    return &person, nil  
}

My primary key for the person table was a base64 encoding of the email address. Yeah, I know that’s a recipe for disaster but, I need to get this moving so bear with me. This is an example of a precictable key versus a generated key. I like predictable keys because I can pick them to match my most often used search criteria. That eliminates that dreaded search ahead in as many cases as possible. So we’ve got that in the collection.Get method in the SDK.

One of the queries that I would predict we would need however is getting the people that belong to an organization. Spoiler alert on question 2 I opted to embed an array of Org Ids in the person table. I was unable to find a method that gave me that result without using the SQL++ method. This meant I had something like:

// GetPeopleByOrgID retrieves a list of people associated with a specific organization ID from Couchbase.func GetPeopleByOrgID(bucket *gocb.Bucket, orgID string) ([]Person, error) {  

    baseSelect := GetBasePersonSelect()  
    sql := fmt.Sprintf(baseSelect+" WHERE '%s' IN orgIds", orgID)  
    peopleScope := bucket.Scope(scopeName)  
    results, err := peopleScope.Query(sql, &gocb.QueryOptions{})  
    var people People  
    if results  nil {  
       err = fmt.Errorf("no results found for query %s", sql)  
       return people, err  
    }  

    for results.Next() {  
       var person Person  
       err = results.Row(&person)  
       if err != nil {  
          err := fmt.Errorf("error getting person from query %v", err)  
          return people, err  
       }  
       people = append(people, person)  
    }  
    return people, nil  

}

There are some reflection shenanigans that I will get into later in this post that lead me to the GetBasePersonSelect function. But, with all that obfuscation there is still a string literal to the effect of baseSelect := "select some, crappy, fields from mycrappytable" and boom I’m back in the land of what I don’t like about RDMS. It’s not a deal breaker, I’m not quittin’ you Couchbase…but you’d better deliver on the in memory stuff!!!

Question 2 – What’s my relational approach?

I’m sure I ruined it for you already in the previous section. I’m sure ChatGPT could have saved you the heart ache, but I asked ChatGPT about this and it said “what are you even doing?” I didn’t have an answer so here we are. I tried to pick a data model that expressed the typical hi-jinx of database design. A collection with an unbounded one to many relationship. That would be the People to Organization relationship. A time series relationship back to a static entity in the Resource to Event relationship. There are more in between that but in this post we’re going to deal with those extremes.

One of the query scenarios I am definitely going to have to deal with is a user coming into the Organization page and asking for the people that belong to that organization. I think I’ve been pretty clear with my distaste with s string literal query approach but this is what we have for that workflow scenario.

// GetPeopleByOrgID retrieves a list of people associated with a specific organization ID from Couchbase.func GetPeopleByOrgID(bucket *gocb.Bucket, orgID string) ([]Person, error) {  

    baseSelect := GetBasePersonSelect()  
    sql := fmt.Sprintf(baseSelect+" WHERE '%s' IN orgIds", orgID)  
    peopleScope := bucket.Scope(scopeName)  
    results, err := peopleScope.Query(sql, &gocb.QueryOptions{})  
    var people People  
    if results  nil {  
       err = fmt.Errorf("no results found for query %s", sql)  
       return people, err  
    }  

    for results.Next() {  
       var person Person  
       err = results.Row(&person)  
       if err != nil {  
          err := fmt.Errorf("error getting person from query %v", err)  
          return people, err  
       }  
       people = append(people, person)  
    }  
    return people, nil  

}

From the Couchbase Ui it looks something like:

Again, not a deal breaker but a little distasteful. I’d rather have something to the effect of:

var keysIWant []string
subArray := "OrgIds"
 people.GetBySubDocKeys(bucket, subArray, keysIwant)

Maybe not exactly that but something to that effect. Some awareness by the SDK that I might need to to some operation on an embedded collection without a SQL++ statement.

While you’re at it give me the ability to update a sub doc without obliterating the parent. Something to the effect of

var (
    docKey = [parent doc id],
    subDockKey = [key of the embedded array to update]
    subDocUpdate = [the verson of the item you want to replace in the embedded array]
)
people.UpdateSubDocById(bucket, docKey, subDockKey, subDocUpdate)

Yeah, so that doesn’t exist so we have to do what I did.

Now the final part of question 2 comes in the form of time series data. I have become very accustomed to representing time series data with a predictive key that takes a rough form of:

timeSeriesKey := keyType + ":" + collectionType + ":" + date + ":" + hour + ":" + subMeasurementPeriod

Now for BigTable type stuff I’m used to being able to do index range operations something to the effect of

func readRowRange(w io.Writer, projectID, instanceID string, tableName string) error {
    // projectID := "my-project-id"
    // instanceID := "my-instance-id"
    // tableName := "mobile-time-series"

    ctx := context.Background()
    client, err := bigtable.NewClient(ctx, projectID, instanceID)
    if err != nil {
        return fmt.Errorf("bigtable.NewClient: %w", err)
    }
    defer client.Close()

    tbl := client.Open(tableName)
    err = tbl.ReadRows(ctx, bigtable.NewRange("phone#4c410523#20190501", "phone#4c410523#201906201"),
        func(row bigtable.Row) bool {
            printRow(w, row)
            return true
        },
    )

    if err != nil {
        return fmt.Errorf("tbl.ReadRows: %w", err)
    }

    return nil
}

For CouchBase I suspect I’m going to have to do something to the effect of:

sqlStatment := "select whatever from whatever where id like `type:subtype:hour:%"

Which if it works, is not horrible but I’m not diggin it. So in summary for question 2 there’s some work to do ahead to get over what might just be my distaste for a certain approach that performs as well as whatever symantec I’d prefer.

Question 3 – Quality of Go Support

The Go package for CouchBase is installed with

go get github.com/couchbase/gocb/v2@latest

and has a comprehensive Go doc at

https://pkg.go.dev/github.com/couchbase/gocb/v2#section-documentation

I went through both the Go and the .Net Quick Start applications and did not see any additional functionality with one language over the other. The release cadence of both packages appeared to be similar. Go is definitely has first class status and is not an after thought. The version of gocb that I used was v2.9.1 released July 18, 2024 which is roughly two months ago at the time I’m writing this.

Question 4 – Resources?

I need to build out the application a little more in order to get a sense of that aspect. I think with my playground app I’ve seen enough to move forward. I will move the code to a package and service and push more data and see where we land.

The Playground App

I can sum it up as building out the minimum viable entities and adding CRUD functions to each. I went a little deeper with the relationship between People and Organizations. I embedded organization ids inside the person document. I wanted to see the query pattern for that relationship. I also used the app to start to build a strategy around what configuration info I’d need to pass around. My goal is to have a pretty good idea of what I’d want in a private entity package and what I’d need in a persistence service.

There were no big surprises in the CRUD operations. It’s a familiar approach if you’ve worked with document or key value databases in the past. The hierarchy is basically buckets -> collections -> documents which is semantically different but conceptually familiar. You wind up with CRUD operations looking like:

func (p Person) Create(bucket *gocb.Bucket) (*gocb.MutationResult, error) {  
    collection := bucket.Scope(scopeName).Collection(personCollectionName)  
    p.ID = p.HashKey()  
    mutResult, err := collection.Insert(  
       p.ID,  
       p,  
       &gocb.InsertOptions{})  

    return mutResult, err  
}  

func (p Person) Update(bucket *gocb.Bucket) (*gocb.MutationResult, error) {  
    collection := bucket.Scope(scopeName).Collection(personCollectionName)  
    mutResult, err := collection.Replace(  
       p.ID,  
       p,  
       &gocb.ReplaceOptions{})  

    return mutResult, err  
}  

func (p Person) Delete(bucket *gocb.Bucket) (*gocb.MutationResult, error) {  
    collection := bucket.Scope(scopeName).Collection(personCollectionName)  
    p.ID = p.HashKey()  
    mutResult, err := collection.Remove(  
       p.ID,  
       &gocb.RemoveOptions{})  

    return mutResult, err  
}  

func GetPerson(bucket *gocb.Bucket, email string) (*Person, error) {  
    collection := bucket.Scope(scopeName).Collection(personCollectionName)  

    id := HashKey(email)  
    result, err := collection.Get(id, &gocb.GetOptions{})  
    if err != nil {  
       return nil, err  
    }  
    var person Person  
    err = result.Content(&person)  
    if err != nil {  
       return nil, err  
    }  
    return &person, nil  
}

Summary

I learned enough during this exercise to know I want to learn more. There are many things I did not explore yet. I did not explore the different options that are presented when creating a new bucket. I simply took the defaults. That means as we move forward we’ll have to get a deeper understanding of what all that means. So I went further than “hello couchbase” here and to be fair so did they in their quick start app. That’s why I’m building this stretch application. So in coming posts I’ll be pushing this further. I’ll be moving the some of the code out of this playground app into a private Go package and a gRPC service that will perform the persistence operations. I’ll load a lot more data and we’ll see what the different bucket configuration options do.

Leave a comment