Big-O Notation is what you need to research to learn more. Wikipedia has a decent explanation. I'll only provide an executive summary here.
Let's say you implement a dialect of Shen that stores lists in a user-configurable data structure (pretend this dialect of Shen is used for a comp-sci project and this reconfigurability is used for benchmarking purposes in your research).
If you want to find an element in a traditional cons-list, that is O(n), because (regardless of how complex the software backing it) you visit each item in the list one by one, linearly.
If you use a vector, that is O(1), again regardless of how many CPU instructions are used, because it works by indexing (if you know the ordinal position already) or hashing (if you don't). You calculate the index of the vector, then directly dereference the structure to get at the data. There is no faster means of accessing data. That's because this directly translates to the underlying CPU architecture's memory reference instructions. For example, on x86 architectures, you can use a CPU instruction like MOV EAX,[EBX+ECX*4] to grab any 32-bit element mentioned in ECX, starting at base address EBX in memory, and put it into EAX. If this is in cache, this is a 4-cycle operation. With CPU's running at 3GHz to 6GHz (in the case of IBM mainframes), you can easily see why nothing can beat an array for access speed!
There are algorithms which are O(log_k n) too. We could, for example, decide to store an association list internally as a skip-list or B-tree. When looking up an association, the runtime might start in the middle of the data set[1], and based on a comparison there, decide to exclude some (large!) portion of the data set, choosing to focus only on what's left. This happens recursively until either all data is exhausted or until a match is found.
There are some algorithms which are O(n^2), such as Bubble Sort, because you end up needing to visit data elements with sufficient frequency that as your data set doubles (say), you'll find you need four times the visits to arrive at a solution. At work not long ago, a coworker had to fix an algorithm that inadvertently turned out to be O(n^n)!
Finally, note that big-O notation can apply independently to run-time and to resource consumption as well. Algorithms can be very simple, using O(1) memory, but taking O(n^2) run-time (e.g., bubble-sort), or the reverse can happen as well (e.g., CPU page tables and other forms of tries). Then there are "constant factors" to worry about; given two O(1) algorithms, which is faster? (E.g., arrays versus hash tables are both O(1), but hash tables has additional hashing overhead.)
And then there are hidden hardware factors too.
If your working set is below 32KiB in size, for example, and is "hot" in L1 cache, you may find that a bubble sort or linear search is actually faster than a qsort or binary tree walk for your needs. This is because when data is in L1 cache, access time costs only 4 CPU clock cycles per datum, versus about 60 to 400 plus 4/datum when it's pushed out to L2/L3. If the data is in SDRAM, cost goes up astronomically from there, and it becomes variable in latency due to other devices competing for access to memory. I've heard of estimates ranging from 800 cycles to over 2000, depending on configuration of hardware used. This is especially relevant if you intend on running data-heavy applications on a machine with a lot of network I/O, such as a database server. This effect actually gets worse as CPU clock speeds and number of cores increases. SDRAM access periods seems fairly well stuck in the 35 to 70ns range, so the faster the CPU is, the more clocks will be wasted waiting for SDRAM to deliver its first byte of data.
Q, being a derived work from APL, implements its "lists" as arrays of boxed objects under the hood. That is, a "list" like [1 2 3 4] will take up 4 words of memory, plus some fixed overhead for garbage collection and other metadata. Access to the vector contents is done by direct indexing, both for access performance as well as superior cache utilization. This is true as well for [1 2 "HELLO" ["wor" "ld!"]].
The runtime is smart enough to optimize a vector's representation as well. For example, that same list [1 2 3 4] can easily be seen to be a vector of integers, so Q stores it in memory as four unboxed integers, one after another. This is why Q's killer application, financial database queries, is so much faster than any of its competitors that I'm aware of. Storing tables in a column-major manner allows for queries to run at CPU/memory bandwidth speeds, since it fully exploits cache locality.
As long as you store information as cons cells in memory, you'll never get to these levels of performance. Following a lengthy list can thrash your L1 cache if your list nodes are not stored efficiently. Even if they're placed adjacent in memory, you'll still only pack half as much data in a cache line, leading to increased thrashing. Thankfully, Shen's memory model doesn't care about the underlying memory representation. This concern is delegated right to KLambda. I have memories of a few papers in existence documenting how to conveniently abstract cons-lists as vectors with no loss of generality, a nice benefit coming from the Lisp machine R&D in the mid-80s.