Database Kernel Development

0 views

Skip to first unread message

Avenall Trejo

unread,

Aug 5, 2024, 4:02:09 AM8/5/24

to sembgesptata

Ourcompany culture is focused on helping our employees enable innovation by building breakthroughs together. How? We focus every day on building the foundation for tomorrow and creating a workplace that embraces differences, values flexibility, and is aligned to our purpose-driven and future-focused work. We offer a highly collaborative, caring team environment with a strong focus on learning and development, recognition for your individual contributions, and a variety of benefit options for you to choose from. Apply now!

SAP innovations help more than 400,000 customers worldwide work together more efficiently and use business insight more effectively. Originally known for leadership in enterprise resource planning (ERP) software, SAP has evolved to become a market leader in end-to-end business application software and related services for database, analytics, intelligent technologies, and experience management. As a cloud company with 200 million users and more than 100,000 employees worldwide, we are purpose-driven and future-focused, with a highly collaborative team ethic and commitment to personal development. Whether connecting global industries, people, or platforms, we help ensure every challenge gets the solution it deserves. At SAP, we build breakthroughs, together.

Qualified applicants will receive consideration for employment without regard to their age, race, religion, national origin, ethnicity, age, gender (including pregnancy, childbirth, et al), sexual orientation, gender identity or expression, protected veteran status, or disability.

Successful candidates might be required to undergo a background verification with an external vendor.

Graduate-level course on the design and implementation of (relational) databasesystem kernels, as well as other large-scale data management techniques. Reviews the relational datamodel (including relational algebra) and relational query language: SQL. Examines in depth file organization,database storage, indexing and hashing, query evaluation and optimization, transactionprocessing, concurrency control and recovery, database integrity and security (if schedule allows).In addition to the study of relational database kernels, this course also investigates latest developmentin other large-scale data management techniques, e.g., streaming algorithms, the MapReduce framework(in particular, the Hadoop system), and other IO efficient techniques (if time permits).Students will participate in a semester-long project and build a mini-database system by implementingseveral core modules in a relational database system. There might also be projects on other large-scaledata management techniques, such as sketching, MapReduce-based projects, etc., if time allows. Insummary, this course is about the principles of designing and implementing database kernels, as well asother relevant large data management techniques. Please note that this is NOT a course on buildingdatabase applications and introduction to database systems, i.e., we will not cover in this course how tobuild a database application (e.g., ER design, schema refinement, functional dependency, and databaseapplication development). Such topics will be covered in CS 5530/ 6530.

Projcet1: Implementing HeapPage for the disk manager in the DBMS.

Projcet2: Implementing the buffer manager in the DBMS.

Projcet3: Implementing the disk-based B+ tree in the DBMS.

Projcet4: Implementing the external merge sort in the DBMS.

Additional Resources Database Research: ACM SIGMOD VLDB IEEE ICDE

Another cool thing about Semantic Kernel is that prompts written for a Python version during app iteration can be used by the C# version for much faster execution at runtime. Semantic Kernel is also proven on Microsoft Azure for Copilot and has reference frameworks for developers to build their own scalable copilots with Azure.

By combining Semantic Kernel with Astra DB, developers can build powerful RAG applications with extended contextual conversation capabilities (such as managing chat and prompt histories) and multi-function or planner capabilities, on a globally scalable vector database proven to give more relevant and faster query responses.

The integration of Astra DB and Semantic Kernel extends beyond technical enhancements, paving the way for a range of business use cases from personalized customer service to intelligent product recommendations and beyond. It's not just about making development easier; it's about enabling the creation of more intelligent, responsive, and personalized AI applications that can transform industries.

When kernel services are invoked in the current process context, its layout throws open the right path for exploring kernels in more detail. Our effort in this chapter is centered around comprehending processes and the underlying ecosystem the kernel provides for them. We will explore the following concepts in this chapter:

Quintessentially, computing systems are designed, developed, and often tweaked for running user applications efficiently. Every element that goes into a computing platform is intended to enable effective and efficient ways for running applications. In other words, computing systems exist to run diverse application programs. Applications can run either as firmware in dedicated devices or as a "process" in systems driven by system software (operating systems).

Multiple instances of the same program can exist with their respective memory allocations. For instance, for a web browser with multiple open tabs (running simultaneous browsing sessions), each tab is considered a process instance by the kernel, with unique memory allocations.

Modern-day computing platforms are expected to handle a plethora of processes efficiently. Operating systems thus must deal with allocating unique memory to all contending processes within the physical memory (often finite) and also ensure their reliable execution. With multiple processes contending and executing simultaneously (multi-tasking), the operating system must ensure that the memory allocation of every process is protected from accidental access by another process.

To address this issue, the kernel provides a level of abstraction between the process and the physical memory called virtualaddress space. Virtual address space is the process' view of memory; it is how the running program views the memory.

Virtual address space creates an illusion that every process exclusively owns the whole memory while executing. This abstracted view of memory is called virtual memory and is achieved by the kernel's memory manager in coordination with the CPU's MMU. Each process is given a contiguous 32 or 64-bit address space, bound by the architecture and unique to that process. With each process caged into its virtual address space by the MMU, any attempt by a process to access an address region outside its boundaries will trigger a hardware fault, making it possible for the memory manger to detect and terminate violating processes, thus ensuring protection.

Modern operating systems not only prevent one process from accessing another but also prevent processes from accidentally accessing or manipulating kernel data and services (as the kernel is shared by all the processes).

Operating systems achieve this protection by segmenting the whole memory into two logical halves, the user and kernel space. This bifurcation ensures that all processes that are assigned address spaces are mapped to the user space section of memory and kernel data and services run in kernel space. The kernel achieves this protection in coordination with the hardware. While an application process is executing instructions from its code segment, the CPU is operating in user mode. When a process intends to invoke a kernel service, it needs to switch the CPU into privileged mode (kernel mode), which is achieved through special functions called APIs (application programming interfaces). These APIs enable user processes to switch into the kernel space using special CPU instructions and then execute the required services through system calls. On completion of the requested service, the kernel executes another mode switch, this time back from kernel mode to user mode, using another set of CPU instructions.

System calls are the kernel's interfaces to expose its services to application processes; they are also called kernel entry points. As system calls are implemented in kernel space, the respective handlers are provided through APIs in the user space. API abstraction also makes it easier and convenient to invoke related system calls.

When a process requests a kernel service through a system call, the kernel will execute on behalf of the caller process. The kernel is now said to be executing in process context. Similarly, the kernel also responds to interrupts raised by other hardware entities; here, the kernel executes in interrupt context. When in interrupt context, the kernel is not running on behalf of any process.

Apart from the address space, a process in memory is also assigned a data structure called the process descriptor, which the kernel uses to identify, manage, and schedule the process. The following figure depicts process address spaces with their respective process descriptors in the kernel:

In Linux, a process descriptor is an instance of type struct task_struct defined in , it is one of the central data structures, and contains all the attributes, identification details, and resource allocation entries that a process holds. Looking at struct task_struct is like a peek into the window of what the kernel sees or works with to manage and schedule a process.

Since the task structure contains a wide set of data elements, which are related to the functionality of various kernel subsystems, it would be out of context to discuss the purpose and scope of all the elements in this chapter. We shall consider a few important elements that are related to process management.

To manage PIDs, the kernel uses a bitmap. This bitmap allows the kernel to keep track of PIDs in use and assign a unique PID for new processes. Each PID is identified by a bit in the PID bitmap; the value of a PID is determined from the position of its corresponding bit. Bits with value 1 in the bitmap indicate that the corresponding PIDs are in use, and those with value 0 indicate free PIDs. Whenever the kernel needs to assign a unique PID, it looks for the first unset bit and sets it to 1, and conversely to free a PID, it toggles the corresponding bit from 1 to 0.