Check out any of Guru’s product launch blog posts, and you’ll notice a recurrent theme: improving the search experience for our customers. And for good reason —with a dedicated search team of data scientists, product managers, and engineers, search and knowledge discoverability in Guru is always being tested and improved upon. Like any technology company with search functionality, it’s a foundational part of Guru that we will always seek to refine and perfect. While search enhancements might not be as “flashy” as UI changes, AI enhancements, or new features, they certainly still pack a punch — and significantly enhance a user’s experience with our product. Today we're catching up with our search team to see what they’ve been working on over the past few months.
Thank you three for joining us today! To get started, can you tell us a little bit about yourselves and what you do on Guru’s Search Pod?
Nina: I’m a data scientist on the Search Pod, so I focus on figuring out what machine learning methods we can experiment with to improve Search. I’ve been focused recently on how we can incorporate the way Cards (the format that information is documented in with Guru) are used (viewing, copying the link or content, favoriting) into our Search algorithm, and moving forward, I’ll be looking into how we can better understand user’s intent while searching to make sure we’re bringing them the most relevant Cards.
Laura: I’m a product manager for the Search Pod, so I spend a lot of time with our customers to get their feedback and understand what is most helpful and important to them. Then, I bring this back to the team, so that we can make decisions on how to improve and evolve search over time. I plan our short, medium, and long-term goals so that we can make improvements continuously on multiple aspects of search.
Jenna: I’m also a data scientist on the Search Pod, and I focus on our algorithm specifically. Right now, I’m focused on our internal tooling that allows us to experiment with different algorithm adjustments and understand how they could impact search results for our customers. I also do data analysis to compare how our search is currently performing vs. how it would perform with potential changes.
The last time we caught up with the Search Pod, we talked about upcoming changes to our algorithm and the ways in which we test search enhancements. Can you tell us a little bit about how that work’s been going?
Laura: Our recent changes have been around taking Card usage into account as another factor for finding the most relevant and useful results.
Nina: The idea stemmed from wanting to understand how Card usage data could impact AI work at Guru in general. Before applying these questions to search specifically, we explored how Card “popularity” correlated with usefulness in a hackathon project!
Jenna: Card usage falls under our larger focus on the Search Pod of bringing in new data sources that can help us understand Card relevance. So usage would be a data source, as well as the work Nina is doing to understand intent.
At the start, we knew that we had a lot of data about the ways Cards were being used across teams, and we hypothesized that user behavior around Cards could inform enhancements to search.
Nina: I think it’s important to note that search isn’t just matching up key terms — it’s also understanding the context of where and when Cards are being used.
Laura: We look at Card usage to help our users in other areas of the product — for example, you can see usage data around Cards waiting for your verification in “My Tasks.”
We also have popularity scores across the app — these usage data points are meant to help users understand what information is most critical for their team.
Bringing that data into search helps us make that a more universal experience.
Jenna: This also helps us make sure search results are helpful and dynamic — for example, maybe a Card’s content doesn’t change much over the course of a year, but the usage increases dramatically during the same time frame. This may indicate that the Card is becoming increasingly more useful for the team, and search results should reflect that.
Can you tell us how the pod makes decisions on whether or not to move forward with changes?
Jenna: The pod is very experimental in our approach, and we have a variety of levels for experiments. Our environments for testing are completely isolated from customer accounts, and there are several rounds of testing that an experiment must “pass” before we even consider releasing the changes to our customers. Because of our experimental setup, we’re able to test changes really rapidly, and be more confident about the changes that we ultimately do deploy to our customers.
Nina: I’d also add that all of these experiments are extremely data-driven. We’ll work on several trials of a change at once, and then use data to understand which had the best intended impact on results. For example, we recently ran a sprint with 110 experiments of varying degrees of granularity and complexity — 2 of which we ended up moving forward with based on results. Sometimes it takes dozens of experiments to decide on a change, sometimes it takes more.
Laura: All of our metrics are centered around having the most relevant results as high up on the results list as possible. But because of the variety of our customer teams and the content in their accounts, we have to go through this rigorous testing to ensure that we’ll see positive results across our entire customer base.
Jenna: Every experiment we run simulates hundreds of thousands of searches, which allows us to simulate the search volume we need to say with confidence that a change will positively impact customers across the board.
Once we do roll out changes to our users, how do we measure their success in helping them find what they need?
Laura: One of the biggest ways we monitor how search is performing for customers is by watching a set of metrics that we've put together. There are a number of industry standard metrics for search that center around precision and recall that we use to get an overall picture of how things are going. These are formulas that help us measure if we're returning relevant content and if it is easy for searchers to find what they need in the list of results (i.e. it's near the top). We then look at more targeted metrics that show us how things are going for different types of searches. So we’ll look at how a proposed change impacts those metrics, and then as a lagging indicator, customer feedback. Depending on the change, we may or may not expect (and get) a lot of customer feedback, but the expectation is that they feel the impact of the changes by being able to find what they need faster and with less friction.
Jenna: We’re basically trying to answer two questions: one, are we surfacing useful Cards? And two, are we avoiding surfacing irrelevant Cards? Another way we evaluate impact is by looking at user behavior after their results have been surfaced — are they searching again? Viewing more Cards? This provides helpful insight into the success of their results.
We’ll end on my favorite question — what’s next for Guru’s search?
Laura: Continuous improvement! I think of two main areas we work on regarding search — the algorithm, and the user experience of the search process. Right now, we’re more focused on the algorithm, but we consider both aspects to be important.
Long-term, we want to incorporate more context into search — including a user’s anticipated usage based on what team they’re on, how they interact with other Cards, etc. — to provide a more personalized search experience.
Nina: We also want to use machine learning to understand the intent behind a user’s search. Sometimes, there’s a gap between what a user actually types and what they’re looking for. For example, a user might search for "sales compensation" while the relevant Card uses the term "commission" so we’ll work to use machine learning to address those gaps.
Jenna: Ultimately, all of this comes with the caveat of testing. As we test all of these possible changes, we can say with confidence that we'll never roll out anything that doesn't demonstrate improvement in our experimentation framework.